Skip to main content

1. Install the Crafter demo to your current working directory

uvx synth-ai demo

2. Save and load your Synth credentials

uvx synth-ai setup
synth-ai setup automatically does the following:
  • Fetches SYNTH_API_KEY and ENVIRONMENT_API_KEY from https://usesynth.ai via your web browser
  • Saves SYNTH_API_KEY and ENVIRONMENT_API_KEY to .env in current working directory
  • Saves SYNTH_API_KEY and ENVIRONMENT_API_KEY to ~/.synth-ai/config.json
  • Loads SYNTH_API_KEY and ENVIRONMENT_API_KEY to process environment
This step is optional if you prefer to load these required credentials manually. CLI reference

3. Deploy the pre-built Crafter task app locally to start collecting rollout data

uvx synth-ai deploy \
  --runtime local \
  --task-app crafter/grpo_crafter_task_app.py
CLI reference

4. Collect rollouts for supervision

  • In a second terminal, request a batch of traced evaluations with uvx synth-ai eval:
    uvx synth-ai eval grpo-crafter \
      --url http://127.0.0.1:8001 \
      --env-file .env \
      --trace-db traces/v3/eval.sqlite \
      --seeds 0-19 \
      --split train
    
  • This command drives /rollout for each seed, writes structured traces to traces/v3/eval.sqlite, and stores per-turn JSONL shards under ft_data/raw_sft.

5. Build a filtered SFT dataset

  • Create configs/filter_crafter.toml with the minimal filter config:
    [filter]
    db = "traces/v3/eval.sqlite"
    output = "ft_data/crafter_sft.jsonl"
    min_official_score = 0.0
    include_dialogue = true
    
  • Export the curated JSONL using uvx synth-ai filter:
    uvx synth-ai filter --config configs/filter_crafter.toml
    
  • The resulting ft_data/crafter_sft.jsonl is ready for supervised fine-tuning.

6. Launch the Crafter FFT baseline

  • Submit the bundled full-finetune job with uvx synth-ai train:
    uvx synth-ai train --type sft \
      --config configs/crafter_fft_4b.toml \
      --dataset ft_data/crafter_sft.jsonl \
      --env-file .env \
      --poll
    
  • The --poll flag streams status updates until the job reaches a terminal state.

7. Evaluate your fine-tuned model

  • After the job finishes, re-run uvx synth-ai eval with the returned fine-tuned model id (for example ft:CRAFT-1234):
    uvx synth-ai eval grpo-crafter \
      --url http://127.0.0.1:8001 \
      --env-file .env \
      --model ft:CRAFT-1234 \
      --trace-db traces/v3/posttrain.sqlite \
      --seeds 0-19 \
      --split train
    
  • Compare the new outcome and event scores against the baseline to confirm the supervised improvement.

Next steps

  • Tighten the filter thresholds (for example, raise min_official_score or add metadata selectors) and rerun steps 6–7 to study data quality trade-offs.
  • Clone configs/crafter_fft_4b.toml, set training.use_qlora = true, and explore LoRA versus full-finetune results on the same dataset.