Skip to main content

SDK Overview

The Synth SDK provides a comprehensive tracing and reward system designed for RL and SFT training. The system captures fine-grained execution details, supports multiple reward types, and enables sophisticated filtering and analysis.

Core Concepts

Sessions and Traces

A session represents a complete execution (e.g., a conversation, RL episode, or batch job). Each session is captured as a v3 trace containing:
  • Structured event history
  • Message exchanges between subsystems
  • Token usage and cost tracking
  • Timing and performance metrics
  • Custom metadata

Events

Events are intra-system facts that capture something that happened:
  • LMCAISEvent: Language model API calls with token/cost tracking
  • EnvironmentEvent: Feedback from environments (rewards, observations)
  • RuntimeEvent: System decisions and actions

Messages

Messages represent information transmitted between subsystems:
  • User → Agent (instructions)
  • Agent → Runtime (decisions)
  • Runtime → Environment (tool executions)
  • Environment → Runtime (results)

Rewards

The system supports two types of rewards:
  1. Event Rewards: Attached to specific events within a session (step-level)
  2. Outcome Rewards: Attached to the entire session (episode-level)
Both reward types support filtering, annotations, and multiple sources (environment, evaluator, human).

Key Features

Common Use Cases

RL Training

from synth_ai.tracing_v3 import SessionTracer

# Create tracer
tracer = SessionTracer(db_path="traces.db", session_id="episode_001")

# Record LLM decisions
lm_event = LMCAISEvent(
    system_instance_id="agent",
    time_record=TimeRecord(event_time=time.time()),
    model_name="gpt-4",
    provider="openai",
    call_records=[...],
)
event_id = tracer.record_event(lm_event)

# Record event rewards for this decision
tracer.record_event_reward(
    event_id=event_id,
    reward_value=0.85,
    reward_type="achievement_delta",
    source="environment",
)

# Record outcome at end of episode
tracer.record_outcome_reward(
    total_reward=10.5,
    achievements_count=7,
    total_steps=42,
)

Judge Evaluation

from synth_ai.judge_schemas import JudgeScoreRequest, JudgeTracePayload

# Prepare trace for judging
request = JudgeScoreRequest(
    policy_name="my-policy-v1",
    task_app=JudgeTaskApp(id="crafter-v1"),
    trace=JudgeTracePayload(
        event_history=[...],
        metadata={"env_name": "crafter"},
    ),
    options=JudgeOptions(
        provider="openai",
        model="gpt-4",
        event=True,
        outcome=True,
    ),
)

# Score returns event_totals and outcome_review
response = await judge_client.score(request)

Filtering for SFT

# Filter sessions by outcome rewards
uvx synth-ai filter \
  --min-reward 5.0 \
  --min-steps 10 \
  --output high_quality.jsonl

Architecture

The tracing system follows a modular architecture:
SessionTrace (episode/conversation)
├── session_time_steps (ordered turns)
│   ├── events (LM calls, env feedback, runtime actions)
│   └── messages (inter-system communication)
├── event_history (flat chronological list)
├── markov_blanket_message_history (flat chronological list)
└── metadata (session-level context)

Reward Tables (separate persistence)
├── event_rewards (linked to event_id)
└── outcome_rewards (linked to session_id)

Next Steps