Skip to main content
Training submits a job to Synth’s /api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you. For CLI flag descriptions, head to Launch Training Jobs.

1. Prepare a training config

SFT configs are TOML files. Start from one of the examples (e.g. examples/warming_up_to_rl/configs/crafter_fft.toml) and adjust the sections below:
[job]
model = "Qwen/Qwen3-4B"  # base model you want to fine-tune
data = "ft_data/crafter_sft.jsonl"
poll_seconds = 1800

[compute]
gpu_type = "H100"
gpu_count = 4
nodes = 1

[training]
mode = "full_finetune"
use_qlora = false

[hyperparameters]
n_epochs = 2
per_device_batch = 2
gradient_accumulation_steps = 64
learning_rate = 8e-6
warmup_ratio = 0.03
Required fields:
  • job.model – must be a supported Synth base model (see /models).
  • job.data or --dataset – path to the JSONL produced by the filter step.
  • hyperparameters – include at least one of n_epochs, total_steps, train_steps, or steps.
Optional fields:
  • data.validation_path – secondary JSONL for evaluation.
  • training.validation.* – configure evaluation cadence (eval_steps, metric_for_best_model, etc.).
  • compute – choose GPU type/count; omit to use organization defaults.

Vision-language (Qwen3-VL)

Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically when job.model points at Qwen/Qwen3-VL-*. Keep the following in mind:
  • Config tweaks: you do not need to add extra knobs—supports_vision, max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clamp per_device_batch / per_device_eval_batch to 1 and raise gradient_accumulation_steps to keep memory in check. If you truly need more than one image per turn, override the registry default with model.max_images_per_message, but expect higher GPU memory pressure.
  • Dataset shape: every JSONL record must contain a messages[] array using the OpenAI multimodal schema. Each message’s content can mix text segments and image segments. We accept:
    {
      "messages": [
        {"role": "system", "content": [{"type": "text", "text": "You are a helpful guide."}]},
        {"role": "user", "content": [
          {"type": "text", "text": "Describe the scene."},
          {"type": "image_url", "image_url": {"url": "https://assets.example.com/frame.png"}}
        ]},
        {"role": "assistant", "content": [{"type": "text", "text": "The robot is holding a red cube."}]}
      ]
    }
    
    The trainer also understands legacy payloads with top-level images / image_url fields, but everything is converted into the message format shown above.
  • Image references: each image_url.url must be resolvable from the training container. HTTPS URLs, public object-store links, and data:image/...;base64,<payload> blobs are supported. Local filesystem paths only work if that path exists inside the uploaded artifact, so prefer URLs or data URIs.
  • Image limits: Qwen3-VL defaults to max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted. Plan your prompts accordingly or bump the limit explicitly if your GPU topology can handle it.

2. Launch the job

uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_sft.jsonl
What happens:
  1. The CLI validates the dataset (it must contain messages[] with ≥2 turns).
  2. The JSONL is uploaded to /api/learning/files; the CLI waits until the backend marks it ready.
  3. A job is created and started with the payload generated from your TOML.
  4. The CLI polls status and prints progress events until the job reaches a terminal state.
Useful flags:
  • --backend – override the Synth API base URL (defaults to production).
  • --model – override the model in the TOML without editing the file.
  • --examples N – upload only the first N JSONL records (smoke testing).
  • --no-poll – submit the job and exit immediately (useful when an agent wants to poll separately).
Reminder: the CLI auto-loads the .env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).

3. Monitor and retrieve outputs

  • Copy the job_id printed in the CLI output.
  • Re-run the command later with --no-poll to check status without re-uploading.
  • Query job details directly:
    curl -H "Authorization: Bearer $SYNTH_API_KEY" \
      https://api.usesynth.ai/api/learning/jobs/<job_id>
    
  • When the job succeeds, note the fine_tuned_model identifier in the response. You will use this value when deploying the updated policy.

4. Automate with agents

  1. Run uvx synth-ai eval to generate traces.
  2. Run uvx synth-ai filter to create JSONL.
  3. Run uvx synth-ai train --type sft … --no-poll.
  4. Poll /api/learning/jobs/<job_id> until status is succeeded.
  5. Fetch the new fine_tuned_model and move on to deployment.
Following this sequence allows a single agent workflow (or CI pipeline) to execute the entire SFT loop without manual intervention.