Skip to main content
synth-ai train submits RL or SFT jobs to the Synth backend, guiding you through config selection, environment setup, and job monitoring.
  • The command accepts one or more TOML configs and validates them before hitting the API. When no config is supplied it scans common directories and prompts for a choice.
  • Environment variables are pulled from .env files you specify (or an interactive list when none are provided). Required keys (SYNTH_API_KEY, ENVIRONMENT_API_KEY) are preflighted and minted when possible.
  • RL jobs automatically verify task-app health by calling /rl/verify_task_app and /health//task_info before submission. Failures are surfaced with detailed diagnostics so you can fix auth issues quickly.
  • SFT jobs can upload dataset JSONL automatically, optionally limiting the first N examples for smoke tests.
  • Job polling is optional but enabled by default; the CLI streams status updates until the training run reaches a terminal state or hits the configured timeout.
  • --dry-run prints the payload without creating a job—useful for sanity checks during config changes.

Options

  • --config PATH — Repeatable. Points to training TOML files. When omitted the CLI auto-discovers configs and prompts.
  • --type {auto,rl,sft} — Force the workflow type. auto infers it from the config.
  • --env-file PATH — One or more .env files to preload. Repeat to merge several files.
  • --task-url URL — Override the task app URL for RL jobs (skips reading it from the config).
  • --dataset PATH — Override the dataset JSONL for SFT jobs.
  • --backend URL — Override the backend base URL (defaults to env settings or production).
  • --model VALUE — Override the model identifier in the config.
  • --allow-experimental / --no-allow-experimental — Toggle experimental model gating without editing configs.
  • --idempotency VALUE — Custom Idempotency-Key header for job creation.
  • --dry-run — Print the payload and exit without creating a job.
  • --poll / --no-poll — Enable or disable status polling after submission.
  • --poll-timeout SECONDS — Maximum polling duration (default 3600).
  • --poll-interval SECONDS — Delay between polling attempts.
  • --examples VALUE — Limit SFT datasets to the first N examples (useful for smoke tests).
uvx synth-ai train --config configs/rl/grpo.toml --env-file .env.crafter
Sample RL session:
synth@Nomans-Resolve sdk % uvx synth-ai train --config configs/rl/grpo.toml
Select an .env file:
  1) /Users/synth/qa/sdk/.env
  2) /Users/synth/.synth-ai/.env
Choice: 1
Using env file: /Users/synth/qa/sdk/.env
Backend base: https://api.usesynth.ai (key sk-...092)
Verification OK (candidates=3, statuses=[200, 200, 200])
Task app healthy
POST https://api.usesynth.ai/rl/jobs
Payload preview:{...}
Response 201: {"job_id": "job_abc123"}
Polling job status (interval=5s, timeout=3600s)...
[00:00] queued
[00:15] running
[03:42] succeeded

Notes

  • Multiple --config values are processed sequentially. If one fails validation, later configs are skipped and the CLI exits with an error.
  • When uploading SFT datasets the CLI waits for the training file to reach the ready state before creating the job, retrying until the --poll-timeout threshold.
  • RL health checks reuse all known environment keys (primary + aliases) so deployments configured with rotations continue to work without manual changes.
  • Idempotency keys are honored per job submission; provide a stable key when you want retry-safe behavior.