Outcome Rewards
Outcome rewards are episode-level summaries attached to complete sessions. They provide a single quality score for the entire episode, enabling efficient filtering, ranking, and analysis.Overview
Outcome rewards capture session-level outcomes:- Total reward (e.g., unique achievements unlocked)
- Achievement count
- Step count (episode length)
- Custom metadata (achievements list, final state, etc.)
- Filtering: Select high-quality episodes for SFT training
- Ranking: Sort episodes by performance
- Analysis: Track policy improvement over time
Schema
Fields
session_id (required)
- Foreign key to
session_traces.session_id - Links this outcome to a specific session/episode
total_reward (required)
- Scalar summary of episode performance
- Common interpretations:
- Unique achievements unlocked this episode
- Cumulative environment reward
- Final score from evaluator
- Any episode-level metric
achievements_count (required)
- Number of achievements/milestones reached
- Useful for environments with discrete goals
total_steps (required)
- Number of steps/turns in the episode
- Used for:
- Filtering by episode length
- Computing efficiency metrics (reward per step)
reward_metadata (optional)
- JSON dictionary with additional context:
Recording Outcome Rewards
From Task Apps
Record outcome rewards at the end of rollout execution:Manual Recording
Querying Outcome Rewards
Get Outcome for a Session
Filter High-Quality Episodes
Compute Statistics
Use Cases
1. Filter for SFT Training
2. Track Policy Improvement
3. Combine with Event Rewards
4. Export Top-K Episodes
Integration with Judges
Judges can populate outcome rewards automatically:Best Practices
1. Choose Meaningful total_reward
2. Populate reward_metadata
3. Normalize for Episode Length
4. Use Appropriate Thresholds
Relationship to Event Rewards
| Aspect | Outcome Rewards | Event Rewards |
|---|---|---|
| Scope | Entire episode | Individual steps |
| Purpose | Filtering, ranking | Credit assignment |
| When recorded | End of episode | During episode |
| FK target | session_id | event_id |
| Use in RL | Episode selection | Advantage computation |
| Use in SFT | Data filtering | Step-level filtering |
See Also
- V3 Traces - Complete trace format specification
- Event Rewards - Step-level rewards
- Filter CLI - Filter sessions by outcome rewards
- Task Apps - Build environments that emit outcome rewards