Train Your Model with Supervised-Fine Tuning

Training submits a job to Synth’s /api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you. Reference: Singh et al. (2023). “Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.” arXiv:2312.06585 Synth implements ReST EM (Reinforced Self-Training with Expectation-Maximization), a self-training approach that generates samples from the model, filters them using binary feedback, fine-tunes the model on these samples, and repeats iteratively to reduce dependence on human-generated data. For CLI flag descriptions, head to Launch Training Jobs.

1. Create the Config TOML

Create a TOML file that follows the schema documented in the SFT config reference.

Vision-language (Qwen3-VL)

Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically when job.model points at Qwen/Qwen3-VL-*. Keep the following in mind:

Config tweaks: you do not need to add extra knobs—supports_vision, max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clamp per_device_batch / per_device_eval_batch to 1 and raise gradient_accumulation_steps to keep memory in check.
Dataset shape: every JSONL record must contain a messages[] array using the OpenAI multimodal schema. Each message’s content can mix text segments and image segments.
Image references: each image_url.url must be resolvable from the training container. HTTPS URLs, public object-store links, and data:image/...;base64,<payload> blobs are supported.
Image limits: Qwen3-VL defaults to max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted.

2. Launch the Job

uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_sft.jsonl

What happens:

The CLI validates the dataset (it must contain messages[] with ≥2 turns). Run with --no-poll if you only want this validation step.
The JSONL is uploaded to /api/learning/files; the CLI waits until the backend marks it ready.
A job is created and started with the payload generated from your TOML.
The CLI polls status and prints progress events until the job reaches a terminal state.

Useful flags:

--model – override the model in the TOML without editing the file.
--examples N – upload only the first N JSONL records (smoke testing).
--no-poll – submit the job and exit immediately (useful when an agent wants to poll separately).

Reminder: the CLI auto-loads the .env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).

3. Monitor and Retrieve Outputs

Copy the job_id printed in the CLI output.
Re-run the command later with --no-poll to check status without re-uploading.
Query job details directly:

  curl -H "Authorization: Bearer $SYNTH_API_KEY" \
    https://agent-learning.onrender.com/api/learning/jobs/<job_id>

When the job succeeds, note the fine_tuned_model identifier in the response. You will use this value when deploying the updated policy.

4. Automate with Agents

Run uvx synth-ai eval to generate traces.
Run uvx synth-ai filter to create JSONL.
Run uvx synth-ai train --type sft … --no-poll.
Poll /api/learning/jobs/<job_id> until status is succeeded.
Fetch the new fine_tuned_model and move on to deployment.

Following this sequence allows a single agent workflow (or CI pipeline) to execute the entire SFT loop without manual intervention.

Troubleshooting Tips

Upload failures usually stem from path issues or zero-byte files—verify the file exists and is readable.
If validation keeps failing, inspect a few JSONL rows manually (or with jq) to confirm schema compliance.
Backend 4xx errors on job creation often indicate missing required config fields; review the payload preview printed just before the request and compare it with your config.

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

Train Your Model with Supervised-Fine Tuning

1. Create the Config TOML

Vision-language (Qwen3-VL)

2. Launch the Job

3. Monitor and Retrieve Outputs

4. Automate with Agents

Troubleshooting Tips

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

​1. Create the Config TOML

​Vision-language (Qwen3-VL)

​2. Launch the Job

​3. Monitor and Retrieve Outputs

​4. Automate with Agents

​Troubleshooting Tips

1. Create the Config TOML

Vision-language (Qwen3-VL)

2. Launch the Job

3. Monitor and Retrieve Outputs

4. Automate with Agents

Troubleshooting Tips