Launch Training Jobs

synth-ai train submits RL or SFT jobs to the Synth backend, guiding you through config selection, environment setup, and job monitoring.

The command accepts one or more TOML configs and validates them before hitting the API. When no config is supplied it scans common directories and prompts for a choice.
Environment variables are pulled from .env files you specify (or an interactive list when none are provided). Required keys (SYNTH_API_KEY, ENVIRONMENT_API_KEY) are preflighted and minted when possible.
RL jobs automatically verify task-app health by calling /rl/verify_task_app and /health//task_info before submission. Failures are surfaced with detailed diagnostics so you can fix auth issues quickly.
SFT jobs can upload dataset JSONL automatically, optionally limiting the first N examples for smoke tests.
Job polling is optional but enabled by default; the CLI streams status updates until the training run reaches a terminal state or hits the configured timeout.
--dry-run prints the payload without creating a job—useful for sanity checks during config changes.

Options

--config PATH — Repeatable. Points to training TOML files. When omitted the CLI auto-discovers configs and prompts.
--type {auto,rl,sft} — Force the workflow type. auto infers it from the config.
--env-file PATH — One or more .env files to preload. Repeat to merge several files.
--task-url URL — Override the task app URL for RL jobs (skips reading it from the config).
--dataset PATH — Override the dataset JSONL for SFT jobs.
--backend URL — Override the backend base URL (defaults to env settings or production).
--model VALUE — Override the model identifier in the config.
--allow-experimental / --no-allow-experimental — Toggle experimental model gating without editing configs.
--idempotency VALUE — Custom Idempotency-Key header for job creation.
--dry-run — Print the payload and exit without creating a job.
--poll / --no-poll — Enable or disable status polling after submission.
--poll-timeout SECONDS — Maximum polling duration (default 3600).
--poll-interval SECONDS — Delay between polling attempts.
--examples VALUE — Limit SFT datasets to the first N examples (useful for smoke tests).

uvx synth-ai train --config configs/rl/grpo.toml --env-file .env.crafter

Sample RL session:

synth@Nomans-Resolve sdk % uvx synth-ai train --config configs/rl/grpo.toml
Select an .env file:
  1) /Users/synth/qa/sdk/.env
  2) /Users/synth/.synth-ai/.env
Choice: 1
Using env file: /Users/synth/qa/sdk/.env
Backend base: https://api.usesynth.ai (key sk-...092)
Verification OK (candidates=3, statuses=[200, 200, 200])
Task app healthy
POST https://api.usesynth.ai/rl/jobs
Payload preview:{...}
Response 201: {"job_id": "job_abc123"}
Polling job status (interval=5s, timeout=3600s)...
[00:00] queued
[00:15] running
[03:42] succeeded

Notes

Multiple --config values are processed sequentially. If one fails validation, later configs are skipped and the CLI exits with an error.
When uploading SFT datasets the CLI waits for the training file to reach the ready state before creating the job, retrying until the --poll-timeout threshold.
RL health checks reuse all known environment keys (primary + aliases) so deployments configured with rotations continue to work without manual changes.
Idempotency keys are honored per job submission; provide a stable key when you want retry-safe behavior.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

Launch Training Jobs

Options

Notes

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​Options

​Notes

Options

Notes