/api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you.
Reference: Singh et al. (2023). “Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.” arXiv:2312.06585
Synth implements ReST EM (Reinforced Self-Training with Expectation-Maximization), a self-training approach that generates samples from the model, filters them using binary feedback, fine-tunes the model on these samples, and repeats iteratively to reduce dependence on human-generated data.
For CLI flag descriptions, head to Launch Training Jobs.
1. Create the Config TOML
Create a TOML file that follows the schema documented in the SFT config reference.Vision-language (Qwen3-VL)
Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically whenjob.model points at Qwen/Qwen3-VL-*. Keep the following in mind:
- Config tweaks: you do not need to add extra knobs—
supports_vision,max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clampper_device_batch/per_device_eval_batchto 1 and raisegradient_accumulation_stepsto keep memory in check. - Dataset shape: every JSONL record must contain a
messages[]array using the OpenAI multimodal schema. Each message’scontentcan mix text segments and image segments. - Image references: each
image_url.urlmust be resolvable from the training container. HTTPS URLs, public object-store links, anddata:image/...;base64,<payload>blobs are supported. - Image limits: Qwen3-VL defaults to
max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted.
2. Launch the Job
- The CLI validates the dataset (it must contain
messages[]with ≥2 turns). Run with--no-pollif you only want this validation step. - The JSONL is uploaded to
/api/learning/files; the CLI waits until the backend marks itready. - A job is created and started with the payload generated from your TOML.
- The CLI polls status and prints progress events until the job reaches a terminal state.
--model– override the model in the TOML without editing the file.--examples N– upload only the firstNJSONL records (smoke testing).--no-poll– submit the job and exit immediately (useful when an agent wants to poll separately).
.env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).
3. Monitor and Retrieve Outputs
- Copy the
job_idprinted in the CLI output. - Re-run the command later with
--no-pollto check status without re-uploading. - Query job details directly:
- When the job succeeds, note the
fine_tuned_modelidentifier in the response. You will use this value when deploying the updated policy.
4. Automate with Agents
- Run
uvx synth-ai evalto generate traces. - Run
uvx synth-ai filterto create JSONL. - Run
uvx synth-ai train --type sft … --no-poll. - Poll
/api/learning/jobs/<job_id>until status issucceeded. - Fetch the new
fine_tuned_modeland move on to deployment.
Troubleshooting Tips
- Upload failures usually stem from path issues or zero-byte files—verify the file exists and is readable.
- If validation keeps failing, inspect a few JSONL rows manually (or with
jq) to confirm schema compliance. - Backend 4xx errors on job creation often indicate missing required config fields; review the payload preview printed just before the request and compare it with your config.