1. Install Demo
task_app.py (Math task), train_cfg.toml (config), main.py (runner).
2. Setup Credentials
SYNTH_API_KEY and ENVIRONMENT_API_KEY. Saves to .env.
3. Deploy Task App
4. Train
Minimal Config
Key Parameters
| Parameter | Purpose |
|---|---|
compute.gpu_count | GPU allocation (min 2 for RL) |
model.trainer_mode | "lora", "qlora", or "full" |
rollout.max_turns | Steps per episode |
training.batch_size | Training batch size |
training.group_size | GSPO group size |
Reward Structure
Task app returns rewards viaRolloutResponse.metrics: