SDK Overview

The Synth SDK provides a comprehensive tracing and reward system designed for RL and SFT training. The system captures fine-grained execution details, supports multiple reward types, and enables sophisticated filtering and analysis.

Core Concepts

Sessions and Traces

A session represents a complete execution (e.g., a conversation, RL episode, or batch job). Each session is captured as a v3 trace containing:

Structured event history
Message exchanges between subsystems
Token usage and cost tracking
Timing and performance metrics
Custom metadata

Events

Events are intra-system facts that capture something that happened:

LMCAISEvent: Language model API calls with token/cost tracking
EnvironmentEvent: Feedback from environments (rewards, observations)
RuntimeEvent: System decisions and actions

Messages

Messages represent information transmitted between subsystems:

User → Agent (instructions)
Agent → Runtime (decisions)
Runtime → Environment (tool executions)
Environment → Runtime (results)

Rewards

The system supports two types of rewards:

Event Rewards: Attached to specific events within a session (step-level)
Outcome Rewards: Attached to the entire session (episode-level)

Both reward types support filtering, annotations, and multiple sources (environment, evaluator, human).

Key Features

v3 Trace Format

Complete session traces with events, messages, and metadata

Event Rewards

Step-level rewards for fine-grained credit assignment

Outcome Rewards

Episode-level rewards for filtering and evaluation

Judge Integration

Automated rubric-based evaluation of traces

Common Use Cases

RL Training

from synth_ai.tracing_v3 import SessionTracer

# Create tracer
tracer = SessionTracer(db_path="traces.db", session_id="episode_001")

# Record LLM decisions
lm_event = LMCAISEvent(
    system_instance_id="agent",
    time_record=TimeRecord(event_time=time.time()),
    model_name="gpt-4",
    provider="openai",
    call_records=[...],
)
event_id = tracer.record_event(lm_event)

# Record event rewards for this decision
tracer.record_event_reward(
    event_id=event_id,
    reward_value=0.85,
    reward_type="achievement_delta",
    source="environment",
)

# Record outcome at end of episode
tracer.record_outcome_reward(
    total_reward=10.5,
    achievements_count=7,
    total_steps=42,
)

Judge Evaluation

from synth_ai.judge_schemas import JudgeScoreRequest, JudgeTracePayload

# Prepare trace for judging
request = JudgeScoreRequest(
    policy_name="my-policy-v1",
    task_app=JudgeTaskApp(id="crafter-v1"),
    trace=JudgeTracePayload(
        event_history=[...],
        metadata={"env_name": "crafter"},
    ),
    options=JudgeOptions(
        provider="openai",
        model="gpt-4",
        event=True,
        outcome=True,
    ),
)

# Score returns event_totals and outcome_review
response = await judge_client.score(request)

Filtering for SFT

# Filter sessions by outcome rewards
uvx synth-ai filter \
  --min-reward 5.0 \
  --min-steps 10 \
  --output high_quality.jsonl

Architecture

The tracing system follows a modular architecture:

SessionTrace (episode/conversation)
├── session_time_steps (ordered turns)
│   ├── events (LM calls, env feedback, runtime actions)
│   └── messages (inter-system communication)
├── event_history (flat chronological list)
├── markov_blanket_message_history (flat chronological list)
└── metadata (session-level context)

Reward Tables (separate persistence)
├── event_rewards (linked to event_id)
└── outcome_rewards (linked to session_id)

Next Steps

Learn V3 Traces

Understand the complete trace format

Use Event Rewards

Implement step-level credit assignment

Use Outcome Rewards

Filter episodes by quality

Task Apps

Build custom environments

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

SDK Overview

SDK Overview

Core Concepts

Sessions and Traces

Events

Messages

Rewards

Key Features

v3 Trace Format

Event Rewards

Outcome Rewards

Judge Integration

Common Use Cases

RL Training

Judge Evaluation

Filtering for SFT

Architecture

Next Steps

Learn V3 Traces

Use Event Rewards

Use Outcome Rewards

Task Apps

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

​SDK Overview

​Core Concepts

​Sessions and Traces

​Events

​Messages

​Rewards

​Key Features

v3 Trace Format

Event Rewards

Outcome Rewards

Judge Integration

​Common Use Cases

​RL Training

​Judge Evaluation

​Filtering for SFT

​Architecture

​Next Steps

Learn V3 Traces

Use Event Rewards

Use Outcome Rewards

Task Apps

SDK Overview

Core Concepts

Sessions and Traces

Events

Messages

Rewards

Key Features

Common Use Cases

RL Training

Judge Evaluation

Filtering for SFT

Architecture

Next Steps