Reinforcement Learning Overview

Run uvx synth-ai setup once so the CLI can load your SYNTH_API_KEY/ENVIRONMENT_API_KEY, then follow the PipelineRL loop below. Every stage is handled by uvx synth-ai—no SDK scripts or local servers required beyond your task app deployment.

Publish a traceable task app – Wrap your environment with TaskAppConfig, enable tracing, and confirm the HTTP contract (/info, /task_info, /rollout). → Task app expectations
Deploy the task app (Modal recommended) – Use uvx synth-ai deploy --runtime modal so rollout actors can reach a stable https://<task>.modal.run endpoint. → Modal deployment
Author the RL config – Declare models, rollout settings, judges, and compute requirements in TOML. → Config reference
Launch PipelineRL – Submit the job via uvx synth-ai train --type rl --config ... --task-url <TASK_APP_URL> and stream progress. → Training guide
Evaluate + iterate – Use uvx synth-ai eval (pointing at the same Modal URL) and the curated examples to benchmark new checkpoints. → Examples

Stick to this CLI-first flow to ensure your RL jobs match the production backend used by Synth.

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

Reinforcement Learning Overview