Train

Training submits a job to Synth’s /api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you. For CLI flag descriptions, head to Launch Training Jobs.

1. Prepare a training config

SFT configs are TOML files. Start from one of the examples (e.g. examples/warming_up_to_rl/configs/crafter_fft.toml) and adjust the sections below:

[job]
model = "Qwen/Qwen3-4B"  # base model you want to fine-tune
data = "ft_data/crafter_sft.jsonl"
poll_seconds = 1800

[compute]
gpu_type = "H100"
gpu_count = 4
nodes = 1

[training]
mode = "full_finetune"
use_qlora = false

[hyperparameters]
n_epochs = 2
per_device_batch = 2
gradient_accumulation_steps = 64
learning_rate = 8e-6
warmup_ratio = 0.03

Required fields:

job.model – must be a supported Synth base model (see /models).
job.data or --dataset – path to the JSONL produced by the filter step.
hyperparameters – include at least one of n_epochs, total_steps, train_steps, or steps.

Optional fields:

data.validation_path – secondary JSONL for evaluation.
training.validation.* – configure evaluation cadence (eval_steps, metric_for_best_model, etc.).
compute – choose GPU type/count; omit to use organization defaults.

Vision-language (Qwen3-VL)

Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically when job.model points at Qwen/Qwen3-VL-*. Keep the following in mind:

Config tweaks: you do not need to add extra knobs—supports_vision, max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clamp per_device_batch / per_device_eval_batch to 1 and raise gradient_accumulation_steps to keep memory in check. If you truly need more than one image per turn, override the registry default with model.max_images_per_message, but expect higher GPU memory pressure.

Dataset shape: every JSONL record must contain a messages[] array using the OpenAI multimodal schema. Each message’s content can mix text segments and image segments. We accept:

{
  "messages": [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful guide."}]},
    {"role": "user", "content": [
      {"type": "text", "text": "Describe the scene."},
      {"type": "image_url", "image_url": {"url": "https://assets.example.com/frame.png"}}
    ]},
    {"role": "assistant", "content": [{"type": "text", "text": "The robot is holding a red cube."}]}
  ]
}

The trainer also understands legacy payloads with top-level images / image_url fields, but everything is converted into the message format shown above.

Image references: each image_url.url must be resolvable from the training container. HTTPS URLs, public object-store links, and data:image/...;base64,<payload> blobs are supported. Local filesystem paths only work if that path exists inside the uploaded artifact, so prefer URLs or data URIs.
Image limits: Qwen3-VL defaults to max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted. Plan your prompts accordingly or bump the limit explicitly if your GPU topology can handle it.

2. Launch the job

uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_sft.jsonl

What happens:

The CLI validates the dataset (it must contain messages[] with ≥2 turns).
The JSONL is uploaded to /api/learning/files; the CLI waits until the backend marks it ready.
A job is created and started with the payload generated from your TOML.
The CLI polls status and prints progress events until the job reaches a terminal state.

Useful flags:

--backend – override the Synth API base URL (defaults to production).
--model – override the model in the TOML without editing the file.
--examples N – upload only the first N JSONL records (smoke testing).
--no-poll – submit the job and exit immediately (useful when an agent wants to poll separately).

Reminder: the CLI auto-loads the .env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).

3. Monitor and retrieve outputs

Copy the job_id printed in the CLI output.
Re-run the command later with --no-poll to check status without re-uploading.

Query job details directly:

curl -H "Authorization: Bearer $SYNTH_API_KEY" \
  https://api.usesynth.ai/api/learning/jobs/<job_id>

When the job succeeds, note the fine_tuned_model identifier in the response. You will use this value when deploying the updated policy.

4. Automate with agents

Run uvx synth-ai eval to generate traces.
Run uvx synth-ai filter to create JSONL.
Run uvx synth-ai train --type sft … --no-poll.
Poll /api/learning/jobs/<job_id> until status is succeeded.
Fetch the new fine_tuned_model and move on to deployment.

Following this sequence allows a single agent workflow (or CI pipeline) to execute the entire SFT loop without manual intervention.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

1. Prepare a training config

Vision-language (Qwen3-VL)

2. Launch the job

3. Monitor and retrieve outputs

4. Automate with agents

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​1. Prepare a training config

​Vision-language (Qwen3-VL)

​2. Launch the job

​3. Monitor and retrieve outputs

​4. Automate with agents

1. Prepare a training config

Vision-language (Qwen3-VL)

2. Launch the job

3. Monitor and retrieve outputs

4. Automate with agents