Skip to main content
Rollout traces live in a SQLite/Turso database (traces/v3/synth_ai.db). The uvx synth-ai filter command turns those traces into SFT-ready JSONL samples that match the schema used by validate_jsonl_or_raise. Need CLI specifics? See Filter Traces for flag-by-flag details.

Example filter config

[filter]
db = "traces/v3/synth_ai.db"
output = "ft_data/crafter_image_only_sft.jsonl"

# Only keep episodes with a positive official score
min_official_score = 0.01

# Optional selectors
splits = ["train"]
task_ids = ["crafter_classic_procedural"]
models = ["Qwen/Qwen3-4B"]
min_judge_scores.primary = 0.7
limit = 500
Save this as configs/filter.toml (the repository ships examples under examples/task_apps/**/filter_sft_dataset.toml).

Run the filter

uvx synth-ai filter \
  --config configs/filter.toml
The command:
  1. Validates the TOML via FilterConfig (fails fast if fields are missing or mis-typed).
  2. Reads candidate sessions from the trace database (local file path or sqlite+aiosqlite:// URL).
  3. Applies score, metadata, and model filters.
  4. Writes SFT records to the output path—one JSON object per line with messages[], optional tools[], and metadata.
All filtering happens client-side, so you can iterate quickly without re-running rollouts.

Inspect the output

head -n 1 ft_data/crafter_image_only_sft.jsonl | jq
Each record contains system, user, and assistant messages plus any tool calls captured during the rollout. Before launching training jobs, run a validation pass:
uvx synth-ai train \
  --type sft \
  --config configs/train.toml \
  --dataset ft_data/crafter_image_only_sft.jsonl \
  --no-poll
The --no-poll flag returns immediately after validation—perfect for agents that want to gate subsequent steps on clean data. When you’re ready to launch a full run, refer to Launch Training Jobs. Tip: the CLI automatically loads secrets from the .env written during setup. Use --env-file only when you need to point at an alternate environment.