1. Prepare Your Task App
- Write or pick a TaskAppConfig (a
.pyfile that registers your app insynth_ai.task.apps.registry). - Create an
.envfile containing at least:Usesynth-ai setupor a password manager to keep these up to date. - Sanity-check the app (optional but recommended). Export
TASKAPP_SFT_OUTPUT_DIR=traces/sft_records, then runuvx synth-ai deploy ...with whichever runtime you plan to use (local for iteration, Modal for shared collectors). Stop here if the app fails; SFT data collection depends on a healthy task app.
2. Deploy for Rollouts
Pick the runtime best suited for the collection session:- Local (
--runtime local): fastest iteration while you develop. Tracing is auto-enabled; setTASKAPP_SFT_OUTPUT_DIRto choose where JSONL batches land. - Modal (
--runtime modal): deploys to Modal so teammates can collect rollouts or run larger batches. Provide--modal-app(and optionally--name) to map to your entrypoint.
3. Collect Rollouts with Tracing Enabled
Rollouts are what power SFT. You can gather them manually, via automation, or by sharing the task app with labelers:- Confirm tracing is still on. Local deploys run with
--traceenabled by default, soTASKAPP_TRACING_ENABLED=1. You must setTASKAPP_SFT_OUTPUT_DIR(orSFT_OUTPUT_DIR) yourself before launching if you want JSONL written to disk. - Point your collector at the task app. Common options:
uvx synth-ai eval ... --url <TASK_APP_URL>to run scripted rollouts.- Custom agents or labeler tools hitting
/rolloutand/states.
- Run enough variety. Aim for dozens to hundreds of sessions that demonstrate the behaviors you care about.
- Verify traces exist: Inspect the directory you pointed
TASKAPP_SFT_OUTPUT_DIRat to make sure JSONL batches are appearing.
4. Export SFT JSONL with synth-ai filter
Once you have traces, convert them into training examples:
- Write a
[filter]TOML describing where to read from and where to write: - Run the filter command:
5. Validate the Dataset Before Training
Run the CLI in validation-only mode so schema issues surface before you launch a full job:6. Next Steps: Train with synth-ai train --type sft
With a validated JSONL, you can update your SFT config to point at the new dataset (or pass --dataset to the CLI) and launch training: