Skip to main content
Training submits a job to Synth’s /api/learning/jobs endpoint, uploads your dataset, and monitors the run until completion. The CLI handles all of this for you. For CLI flag descriptions, head to Launch Training Jobs.

1. Create the config TOML for your task app

Create a TOML file that follows the schema documented in the SFT config reference.

Vision-language (Qwen3-VL)

Synth treats Qwen3-VL checkpoints as multimodal models and flips the SFT pipeline into “vision mode” automatically when job.model points at Qwen/Qwen3-VL-*. Keep the following in mind:
  • Config tweaks: you do not need to add extra knobs—supports_vision, max_images_per_message, and BF16 precision are pulled from the model registry. The trainer will clamp per_device_batch / per_device_eval_batch to 1 and raise gradient_accumulation_steps to keep memory in check. If you truly need more than one image per turn, override the registry default with model.max_images_per_message, but expect higher GPU memory pressure.
  • Dataset shape: every JSONL record must contain a messages[] array using the OpenAI multimodal schema. Each message’s content can mix text segments and image segments. We accept:
    {
      "messages": [
        {"role": "system", "content": [{"type": "text", "text": "You are a helpful guide."}]},
        {"role": "user", "content": [
          {"type": "text", "text": "Describe the scene."},
          {"type": "image_url", "image_url": {"url": "https://assets.example.com/frame.png"}}
        ]},
        {"role": "assistant", "content": [{"type": "text", "text": "The robot is holding a red cube."}]}
      ]
    }
    
    The trainer also understands legacy payloads with top-level images / image_url fields, but everything is converted into the message format shown above.
  • Image references: each image_url.url must be resolvable from the training container. HTTPS URLs, public object-store links, and data:image/...;base64,<payload> blobs are supported. Local filesystem paths only work if that path exists inside the uploaded artifact, so prefer URLs or data URIs.
  • Image limits: Qwen3-VL defaults to max_images_per_message = 1. Additional images in a single turn are trimmed and a debug log is emitted. Plan your prompts accordingly or bump the limit explicitly if your GPU topology can handle it.

2. Launch the job

uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_sft.jsonl
What happens:
  1. The CLI validates the dataset (it must contain messages[] with ≥2 turns).
  2. The JSONL is uploaded to /api/learning/files; the CLI waits until the backend marks it ready.
  3. A job is created and started with the payload generated from your TOML.
  4. The CLI polls status and prints progress events until the job reaches a terminal state.
Useful flags:
  • --backend – override the Synth API base URL (defaults to production).
  • --model – override the model in the TOML without editing the file.
  • --examples N – upload only the first N JSONL records (smoke testing).
  • --no-poll – submit the job and exit immediately (useful when an agent wants to poll separately).
Reminder: the CLI auto-loads the .env produced by uvx synth-ai setup. Use --env-file when you need to target a different secrets file (you can pass the flag multiple times to layer values).

3. Monitor and retrieve outputs

  • Copy the job_id printed in the CLI output.
  • Re-run the command later with --no-poll to check status without re-uploading.
  • Query job details directly:
    curl -H "Authorization: Bearer $SYNTH_API_KEY" \
      https://api.usesynth.ai/api/learning/jobs/<job_id>
    
  • When the job succeeds, note the fine_tuned_model identifier in the response. You will use this value when deploying the updated policy.

4. Automate with agents

  1. Run uvx synth-ai eval to generate traces.
  2. Run uvx synth-ai filter to create JSONL.
  3. Run uvx synth-ai train --type sft … --no-poll.
  4. Poll /api/learning/jobs/<job_id> until status is succeeded.
  5. Fetch the new fine_tuned_model and move on to deployment.
Following this sequence allows a single agent workflow (or CI pipeline) to execute the entire SFT loop without manual intervention.