/api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you.
For CLI flag descriptions, head to Launch Training Jobs.
1. Prepare a training config
SFT configs are TOML files. Start from one of the examples (e.g.examples/warming_up_to_rl/configs/crafter_fft.toml) and adjust the sections below:
job.model– must be a supported Synth base model (see/models).job.dataor--dataset– path to the JSONL produced by the filter step.hyperparameters– include at least one ofn_epochs,total_steps,train_steps, orsteps.
data.validation_path– secondary JSONL for evaluation.training.validation.*– configure evaluation cadence (eval_steps,metric_for_best_model, etc.).compute– choose GPU type/count; omit to use organization defaults.
Vision-language (Qwen3-VL)
Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically whenjob.model points at Qwen/Qwen3-VL-*. Keep the following in mind:
-
Config tweaks: you do not need to add extra knobs—
supports_vision,max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clampper_device_batch/per_device_eval_batchto 1 and raisegradient_accumulation_stepsto keep memory in check. If you truly need more than one image per turn, override the registry default withmodel.max_images_per_message, but expect higher GPU memory pressure. -
Dataset shape: every JSONL record must contain a
messages[]array using the OpenAI multimodal schema. Each message’scontentcan mix text segments and image segments. We accept:The trainer also understands legacy payloads with top-levelimages/image_urlfields, but everything is converted into the message format shown above. -
Image references: each
image_url.urlmust be resolvable from the training container. HTTPS URLs, public object-store links, anddata:image/...;base64,<payload>blobs are supported. Local filesystem paths only work if that path exists inside the uploaded artifact, so prefer URLs or data URIs. -
Image limits: Qwen3-VL defaults to
max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted. Plan your prompts accordingly or bump the limit explicitly if your GPU topology can handle it.
2. Launch the job
- The CLI validates the dataset (it must contain
messages[]with ≥2 turns). - The JSONL is uploaded to
/api/learning/files; the CLI waits until the backend marks itready. - A job is created and started with the payload generated from your TOML.
- The CLI polls status and prints progress events until the job reaches a terminal state.
--backend– override the Synth API base URL (defaults to production).--model– override the model in the TOML without editing the file.--examples N– upload only the firstNJSONL records (smoke testing).--no-poll– submit the job and exit immediately (useful when an agent wants to poll separately).
.env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).
3. Monitor and retrieve outputs
- Copy the
job_idprinted in the CLI output. - Re-run the command later with
--no-pollto check status without re-uploading. - Query job details directly:
- When the job succeeds, note the
fine_tuned_modelidentifier in the response. You will use this value when deploying the updated policy.
4. Automate with agents
- Run
uvx synth-ai evalto generate traces. - Run
uvx synth-ai filterto create JSONL. - Run
uvx synth-ai train --type sft … --no-poll. - Poll
/api/learning/jobs/<job_id>until status issucceeded. - Fetch the new
fine_tuned_modeland move on to deployment.