synth-ai train submits RL or SFT jobs to the Synth backend, guiding you through config selection, environment setup, and job monitoring.
- The command accepts one or more TOML configs and validates them before hitting the API. When no config is supplied it scans common directories and prompts for a choice.
- Environment variables are pulled from
.envfiles you specify (or an interactive list when none are provided). Required keys (SYNTH_API_KEY,ENVIRONMENT_API_KEY) are preflighted and minted when possible. - RL jobs automatically verify task-app health by calling
/rl/verify_task_appand/health//task_infobefore submission. Failures are surfaced with detailed diagnostics so you can fix auth issues quickly. - SFT jobs can upload dataset JSONL automatically, optionally limiting the first N examples for smoke tests.
- Job polling is optional but enabled by default; the CLI streams status updates until the training run reaches a terminal state or hits the configured timeout.
--dry-runprints the payload without creating a job—useful for sanity checks during config changes.
Options
--config PATH— Repeatable. Points to training TOML files. When omitted the CLI auto-discovers configs and prompts.--type {auto,rl,sft}— Force the workflow type.autoinfers it from the config.--env-file PATH— One or more.envfiles to preload. Repeat to merge several files.--task-url URL— Override the task app URL for RL jobs (skips reading it from the config).--dataset PATH— Override the dataset JSONL for SFT jobs.--backend URL— Override the backend base URL (defaults to env settings or production).--model VALUE— Override the model identifier in the config.--allow-experimental / --no-allow-experimental— Toggle experimental model gating without editing configs.--idempotency VALUE— CustomIdempotency-Keyheader for job creation.--dry-run— Print the payload and exit without creating a job.--poll / --no-poll— Enable or disable status polling after submission.--poll-timeout SECONDS— Maximum polling duration (default3600).--poll-interval SECONDS— Delay between polling attempts.--examples VALUE— Limit SFT datasets to the first N examples (useful for smoke tests).
Notes
- Multiple
--configvalues are processed sequentially. If one fails validation, later configs are skipped and the CLI exits with an error. - When uploading SFT datasets the CLI waits for the training file to reach the
readystate before creating the job, retrying until the--poll-timeoutthreshold. - RL health checks reuse all known environment keys (primary + aliases) so deployments configured with rotations continue to work without manual changes.
- Idempotency keys are honored per job submission; provide a stable key when you want retry-safe behavior.