Skip to main content
Training submits a job to Synth’s /api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you. Reference: Singh et al. (2023). “Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.” arXiv:2312.06585 Synth implements ReST EM (Reinforced Self-Training with Expectation-Maximization), a self-training approach that generates samples from the model, filters them using binary feedback, fine-tunes the model on these samples, and repeats iteratively to reduce dependence on human-generated data. For CLI flag descriptions, head to Launch Training Jobs.

1. Create the Config TOML

Create a TOML file that follows the schema documented in the SFT config reference.

Vision-language (Qwen3-VL)

Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically when job.model points at Qwen/Qwen3-VL-*. Keep the following in mind:
  • Config tweaks: you do not need to add extra knobs—supports_vision, max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clamp per_device_batch / per_device_eval_batch to 1 and raise gradient_accumulation_steps to keep memory in check.
  • Dataset shape: every JSONL record must contain a messages[] array using the OpenAI multimodal schema. Each message’s content can mix text segments and image segments.
  • Image references: each image_url.url must be resolvable from the training container. HTTPS URLs, public object-store links, and data:image/...;base64,<payload> blobs are supported.
  • Image limits: Qwen3-VL defaults to max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted.

2. Launch the Job

uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_sft.jsonl
What happens:
  1. The CLI validates the dataset (it must contain messages[] with ≥2 turns). Run with --no-poll if you only want this validation step.
  2. The JSONL is uploaded to /api/learning/files; the CLI waits until the backend marks it ready.
  3. A job is created and started with the payload generated from your TOML.
  4. The CLI polls status and prints progress events until the job reaches a terminal state.
Useful flags:
  • --model – override the model in the TOML without editing the file.
  • --examples N – upload only the first N JSONL records (smoke testing).
  • --no-poll – submit the job and exit immediately (useful when an agent wants to poll separately).
Reminder: the CLI auto-loads the .env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).

3. Monitor and Retrieve Outputs

  • Copy the job_id printed in the CLI output.
  • Re-run the command later with --no-poll to check status without re-uploading.
  • Query job details directly:
  curl -H "Authorization: Bearer $SYNTH_API_KEY" \
    https://agent-learning.onrender.com/api/learning/jobs/<job_id>
  • When the job succeeds, note the fine_tuned_model identifier in the response. You will use this value when deploying the updated policy.

4. Automate with Agents

  1. Run uvx synth-ai eval to generate traces.
  2. Run uvx synth-ai filter to create JSONL.
  3. Run uvx synth-ai train --type sft … --no-poll.
  4. Poll /api/learning/jobs/<job_id> until status is succeeded.
  5. Fetch the new fine_tuned_model and move on to deployment.
Following this sequence allows a single agent workflow (or CI pipeline) to execute the entire SFT loop without manual intervention.

Troubleshooting Tips

  • Upload failures usually stem from path issues or zero-byte files—verify the file exists and is readable.
  • If validation keeps failing, inspect a few JSONL rows manually (or with jq) to confirm schema compliance.
  • Backend 4xx errors on job creation often indicate missing required config fields; review the payload preview printed just before the request and compare it with your config.