Skip to main content
Synth is a post-training platform for agents. You ship a task app (a small FastAPI service that exposes your environment), collect traces through structured rollouts, and then run either supervised fine-tuning (SFT) or reinforcement learning (RL) to improve the policy. Every flow shares the same core pieces:
  1. Task app – describes actions, observations, rubrics, and tracing hooks.
  2. Rollouts / eval – run the agent to gather data, judge results, and benchmark policies.
  3. Training job – submit an SFT or RL job with the Synth CLI, monitor progress, and retrieve the updated model.
Agents (humans or other services) can read these pages top-to-bottom and reproduce the entire workflow.

How the docs are organized

  • Quickstart – install the CLI, pair it with your dashboard session, and walk through the math demo.
  • SFT docs – gather or filter JSONL datasets, configure hyperparameters, and launch fine-tuning jobs.
  • RL docs – configure trainer topologies, deploy task apps to Modal, and run online RL.
  • Rollouts – the uvx synth-ai eval command covers both evaluation and data collection.
  • Reference material such as pricing and models lives in their dedicated sections.

Model families (Synth hosted)

ModelSizesNotes
Qwen 30.6B, 1.7B, 4B, 8B, 14B, 32BBest starting point, dense thinking models that support tool-calling
Qwen 3 (Advanced)4B-2507, 30B-A3B, 235B-A22B*, 480B-A35B*Unique Instruct & Thinking variants with MoE support
Qwen 3 VL2B, 4B, 8B, 30B-A3B, 32B, 235B-A22B*Multimodal (vision + language) family with Instruct & Thinking variants
Qwen 3 Coder30B-A3B, 480B-A35B*Specialized for code generation (Instruct only)
* 235B and 480B models must be sharded across multiple GPUs for inference and training.