synth-ai filter reads a trace database and exports the subset of sessions that match your grading rules to SFT-ready JSONL.
- The command only needs a TOML config; the required
[filter]table tells it which trace DB to read, where to write the JSONL, and which thresholds to apply. - Config values are validated with the same schema that powers the training tooling, so typos (missing
db, bad score thresholds, etc.) are caught up front. - Each accepted session becomes a single JSONL record containing user/assistant turns plus metadata (session id, task, seed, rewards, model, timestamps).
- Judge score thresholds, official reward ranges, split/model filters, and created-at windows are all optional—you can combine them to carve out exactly the dataset you need.
- When metadata contains structured (multimodal) messages, the export preserves that structure so downstream tooling can still replay images or tool calls.
- If no sessions match, the command exits politely instead of writing an empty file, so you know to adjust the filters.
Options
--config PATH— Required. Points to the TOML file containing the[filter]block (db,output, optional thresholds). The CLI refuses to run without it.
configs/filter_local.toml):
Notes
- The command loads the trace DB with
SessionTracer, so both SQLite files and Turso URLs are supported (sqlite+aiosqlite:///path/to.dbworks out of the box). - Score helpers respect both minimum and maximum bounds. You can mix
min_official_score,max_official_score,min_judge_scores.<name>, andmax_judge_scores.<name>in the same config. - When judge scores live in trace metadata, the exporter pulls them directly; otherwise the field is left unset in the output JSONL.
- Multimodal prompts stay intact: user messages containing image blocks or tool directives are written exactly as the task app recorded them.