Examples
Run any example interactively with:-
Evals Demo
- Compare models on the Crafter environment with parallel episodes and stacked progress bars
- Post-run: filter traces to JSONL and view summary stats
- Uses OpenAI-compatible API; bring your
OPENAI_API_KEY
-
Rejection Finetuning
- End-to-end: generate traces → filter to SFT JSONL → kick off SFT → run fine-tuned model
- Qwen/Qwen3-4B Instruct with tool-calling in Crafter; fine-tunes via Synth API
- Requires
SYNTH_API_KEY
and local tracing (uvx synth-ai serve
) for dataset prep