Required TOML headers:
Algorithm header required keys:
- type = “online”
- method
- Value must be one of:
- “policy_gradient”
- “ppo”
- “gspo”
- Value must be one of:
- variety
Policy header required keys:
- model_name or source
- trainer_mode
- label
Compute header required keys:
- gpu_type
- gpu_count
- rollout.env_name
- rollout.policy_name
- topology.reference_placement
Training header required keys:
- num_epochs
- iterations_per_epoch
- max_turns
- batch_size
- group_size
- learning_rate
Evaluation header required keys:
- instances
- every_n_iters
- seeds
Services header required key:
- task_url