Rejection Finetuning

This walkthrough mirrors the example under synth-ai/examples/finetuning/synth_qwen/. Requirements

Have uv installed and use uvx/uv run
SYNTH_API_KEY exported in your shell
Local tracing and environment service deployed with uvx synth-ai serve

What this demo shows

End-to-end flow in four steps: Generate traces → Filter to SFT JSONL → Kick off SFT → Run fine-tuned model
Uses Qwen/Qwen3-4B-Instruct-2507 with tool-calling in a Crafter environment
Central configuration via examples/finetuning/synth_qwen/config.toml

Overview: ReAct agent + tool-calling in Crafter

Agent loop: A ReAct-style LLM agent runs inside the Crafter environment. Each turn the model thinks in text and issues a structured tool call (OpenAI functions) to act in the world.
Tool-calling: We send OpenAI-compatible messages plus function tools (e.g., step/look). For Qwen3 we use its native chat template and support tool_choice and stop_after_tool_calls to ensure a clean, single action per turn.
API usage:
- Initial rollouts use a dev-only instance of Qwen/Qwen3-4B-Instruct-2507 via the Synth inference API to generate traces.
- We filter those traces into an OpenAI-format SFT JSONL and kick off fine-tuning through the same Synth API.
- Fine-tuning returns a model id like ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-<full-uuid>, which we then use for inference in Crafter.
Observability: Full tracing (SQLite/Turso) captures sessions, tool calls, rewards, and tokens for analysis and dataset creation.

Quick setup

uvx synth-ai serve  # optional, for local tracing

# Auth (prod)
export SYNTH_API_KEY="$SYNTH_API_KEY"

# Optional: copy example env and adjust
cp synth-ai/examples/finetuning/synth_qwen/.env.example synth-ai/examples/finetuning/synth_qwen/.env

Generate traces (Qwen 4B)

uvpm examples.finetuning.synth_qwen.run_crafter_qwen4b

Example output (abridged)

✅ Crafter service is healthy
Running 10 episodes (concurrency=5)...
✅ Completed 10 episodes in ~366s
📊 EVALUATION RESULTS
Episodes completed: 10/10
Average reward per episode: 1.10
Average steps per episode: 87.00
💾 Results: traces/synth_ai.db

Filter traces → SFT JSONL

Option A (generic thresholds)

uvpm examples.finetuning.synth_qwen.filter_traces

Option B (require achievements)

uvpm examples.finetuning.synth_qwen.filter_traces_achievements

Example output

Using database: sqlite+aiosqlite:///$PWD/traces/synth_ai.db/dbs/default/data
Output file: ft_data/qwen4b_crafter_sft_collect_wood.jsonl
✅ Wrote 13 examples from 13 sessions

Finetune (SFT)

uvpm examples.finetuning.synth_qwen.sft_kickoff

Example output (abridged)

🚀 Starting Qwen 4B SFT
⏳ poll ...
🟢 Qwen4B SFT fine-tune succeeded → ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-6cedf721e0ca4c80968834b71e2bdace

Evaluate the fine-tuned adapter

CRAFTER_MODEL="ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-6cedf721e0ca4c80968834b71e2bdace" \
uvpm examples.finetuning.synth_qwen.run_crafter_qwen4b

Example output (abridged)

✅ Model warmed up successfully!
Running 5 episodes (concurrency=5)...
✅ Completed 5 episodes in 58s
📊 EVALUATION RESULTS
Average reward per episode: 0.60
💾 Results: traces/synth_ai.db

Inspecting traces

uvx synth-ai traces

Synth-AI

​Overview: ReAct agent + tool-calling in Crafter

Overview: ReAct agent + tool-calling in Crafter