Synth Graphs

Graphs are Synth’s abstraction for multi-node LLM workflows. Like task apps, graphs are first-class artifacts you can train, download, and serve in production.

What is a Graph?

A Synth graph is a directed workflow of LLM calls and transformations. Each node can:

Call an LLM with a specific prompt template
Transform data between nodes
Branch conditionally based on intermediate results
Aggregate outputs from multiple paths

Unlike single prompts, graphs can express complex reasoning patterns: chain-of-thought, retrieval-augmented generation, self-consistency, and more.

Graph Types

Synth supports two fundamental graph types:

Type	Purpose	Example
`policy`	Maps inputs to outputs	QA, classification, generation
`verifier`	Judges/scores existing results	Quality scoring, ranking, evaluation

Policy Graphs

Policy graphs solve tasks. They take an input and produce an output:

Input → [Graph] → Output

Examples:

Question answering: {question, context} → {answer}
Classification: {text} → {category, confidence}
Code generation: {spec} → {code}

Verifier Graphs

Verifier graphs evaluate execution traces and produce structured scores. At inference time, they take:

A V3 trace - The execution trace from synth-ai tracing
A rubric - Evaluation criteria defining what to score

{trace, rubric} → [Verifier Graph] → {score, event_rewards, reasoning}

Key use case: Training custom judges. Instead of using expensive frontier models (GPT-4, Claude) to evaluate outputs, you can train a verifier graph that:

Matches human evaluation quality
Runs on cheaper models (GPT-4o-mini, Groq)
Provides consistent, calibrated scores
Returns structured rewards (event-level and outcome-level)

Verifier Graph Dataset Format

Critical: Training a verifier graph requires a specific dataset format. The dataset must include V3 traces as inputs and gold scores as outputs.

Required Dataset Structure

A verifier-compliant ADAS dataset has this structure:

{
  "version": "1.0",
  "metadata": {
    "name": "my-verifier-dataset",
    "description": "Training data for custom judge"
  },
  "tasks": [
    {
      "id": "trace_001",
      "input": {
        "trace": { /* V3 SessionTrace object */ }
      }
    }
  ],
  "gold_outputs": [
    {
      "task_id": "trace_001",
      "output": {
        "score": 0.85
      }
    }
  ]
}

The `tasks[].input.trace` Field (Required)

Each task input must contain a trace field with a V3 SessionTrace object:

{
  "id": "trace_001",
  "input": {
    "trace": {
      "session_id": "session_abc123",
      "session_time_steps": [
        {
          "step_id": "step_0",
          "step_index": 0,
          "events": [
            {
              "event_id": 1,
              "event_type": "runtime",
              "metadata": {
                "action": "search_database",
                "query": "capital of France"
              }
            },
            {
              "event_id": 2,
              "event_type": "environment",
              "metadata": {
                "result": "Paris",
                "source": "geography_db"
              }
            }
          ]
        },
        {
          "step_id": "step_1",
          "step_index": 1,
          "events": [
            {
              "event_id": 3,
              "event_type": "runtime",
              "metadata": {
                "action": "generate_response",
                "output": "The capital of France is Paris."
              }
            }
          ]
        }
      ],
      "metadata": {
        "environment": "qa_system",
        "model": "gpt-4o-mini"
      }
    }
  }
}

Event IDs are critical. Each event must have an integer event_id so the verifier can assign per-event rewards that link back to specific actions in the trace.

The `gold_outputs[].output.score` Field (Required)

Every gold output must include a score field (float, 0-1):

{
  "task_id": "trace_001",
  "output": {
    "score": 0.85
  }
}

Optional: Event-Level Rewards

For fine-grained training, include event_rewards to teach the verifier which specific events were good or bad:

{
  "task_id": "trace_001",
  "output": {
    "score": 0.85,
    "event_rewards": [
      {
        "event_id": 1,
        "value": 1.0,
        "annotation": {"reason": "Good query formulation"}
      },
      {
        "event_id": 3,
        "value": 0.7,
        "annotation": {"reason": "Correct but could be more detailed"}
      }
    ]
  }
}

Optional: Outcome-Level Rewards

Include episode-level summary information:

{
  "task_id": "trace_001",
  "output": {
    "score": 0.85,
    "outcome": {
      "total_reward": 0.85,
      "achievements_count": 3,
      "annotation": {"summary": "Completed main objective"}
    },
    "outcome_feedback": "Agent found the correct answer efficiently"
  }
}

Optional: Default Rubric

Include a rubric in the dataset metadata to define evaluation criteria:

{
  "metadata": {
    "name": "my-verifier-dataset",
    "default_rubric": {
      "outcome": {
        "criteria": [
          {
            "name": "correctness",
            "description": "Is the final answer factually correct?",
            "weight": 2.0
          },
          {
            "name": "efficiency",
            "description": "Did the agent reach the answer efficiently?",
            "weight": 1.0
          }
        ]
      },
      "events": {
        "criteria": [
          {
            "name": "appropriate_action",
            "description": "Was each action appropriate for the context?",
            "weight": 1.0
          }
        ]
      }
    }
  }
}

Complete Verifier Dataset Example

Here’s a complete, production-ready verifier dataset:

{
  "version": "1.0",
  "metadata": {
    "name": "qa-judge-training",
    "description": "Training data for QA evaluation judge",
    "default_rubric": {
      "outcome": {
        "criteria": [
          {"name": "correctness", "description": "Factual accuracy", "weight": 2.0},
          {"name": "completeness", "description": "Answer addresses all parts", "weight": 1.0}
        ]
      }
    }
  },
  "tasks": [
    {
      "id": "trace_001",
      "input": {
        "trace": {
          "session_id": "qa_session_001",
          "session_time_steps": [
            {
              "step_id": "step_0",
              "step_index": 0,
              "events": [
                {"event_id": 1, "event_type": "environment", "metadata": {"question": "What is 2+2?"}},
                {"event_id": 2, "event_type": "runtime", "metadata": {"thought": "Simple arithmetic"}},
                {"event_id": 3, "event_type": "runtime", "metadata": {"answer": "4"}}
              ]
            }
          ],
          "metadata": {"task_type": "math"}
        }
      }
    },
    {
      "id": "trace_002",
      "input": {
        "trace": {
          "session_id": "qa_session_002",
          "session_time_steps": [
            {
              "step_id": "step_0",
              "step_index": 0,
              "events": [
                {"event_id": 1, "event_type": "environment", "metadata": {"question": "What is the capital of France?"}},
                {"event_id": 2, "event_type": "runtime", "metadata": {"answer": "London"}}
              ]
            }
          ],
          "metadata": {"task_type": "geography"}
        }
      }
    }
  ],
  "gold_outputs": [
    {
      "task_id": "trace_001",
      "output": {
        "score": 1.0,
        "event_rewards": [
          {"event_id": 2, "value": 1.0},
          {"event_id": 3, "value": 1.0}
        ],
        "outcome_feedback": "Correct answer with clear reasoning"
      }
    },
    {
      "task_id": "trace_002",
      "output": {
        "score": 0.0,
        "event_rewards": [
          {"event_id": 2, "value": 0.0}
        ],
        "outcome_feedback": "Incorrect - Paris is the capital of France"
      }
    }
  ],
  "judge_config": {
    "mode": "rubric",
    "model": "gpt-4o-mini",
    "provider": "openai"
  }
}

Training a Verifier Graph

With your dataset prepared, train via ADAS:

from synth_ai.sdk.api.train.adas import ADASJob

job = ADASJob.from_dataset(
    "verifier_dataset.json",
    graph_type="verifier",  # Critical: specify verifier type
    policy_model="gpt-4o-mini",
    rollout_budget=200,
)
job.submit()
result = job.stream_until_complete()

Or via the Graph GEPA config:

[graph]
graph_type = "verifier"

[model]
model_id = "gpt-4o-mini"
provider = "openai"

Verifier Graph Inference

At inference time, pass a V3 trace and rubric:

result = verifier_job.run_judge(
    session_trace={
        "session_id": "new_trace_001",
        "session_time_steps": [...],
        "metadata": {...}
    },
    context={
        "rubric": {
            "outcome": {
                "criteria": [
                    {"name": "correctness", "description": "Is the answer correct?", "weight": 1.0}
                ]
            }
        }
    }
)

print(result["score"])          # 0.92
print(result["reasoning"])       # "The answer is factually correct..."
print(result["event_rewards"])   # [{"event_id": 1, "value": 0.9}, ...]

Inference via cURL

curl -X POST $HOST/api/adas/graph/judge \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "adas_XXXX",
    "session_trace": {
      "session_id": "trace_to_evaluate",
      "session_time_steps": [...],
      "metadata": {}
    },
    "context": {
      "rubric": {
        "outcome": {"criteria": [{"name": "quality", "description": "...", "weight": 1.0}]}
      }
    }
  }'

Benefits of Trained Verifier Graphs

This enables:

10x cost reduction vs frontier model judging
Consistent evaluation across runs
Domain-specific scoring tuned to your criteria
Structured rewards at event and outcome levels
Integration with RL training - use verifier output as rewards

Graph Structures

Structure	Description
`single_prompt`	One LLM call, minimal complexity
`dag`	Multiple nodes in sequence, no branching
`conditional`	Full graph with conditional branching

Creating Graphs

Graphs are created through optimization. You provide:

A dataset - Examples of inputs and expected outputs
Configuration - Graph type, structure constraints, models to use
A budget - How much optimization to run

Synth’s Graph GEPA algorithm then evolves an optimal graph structure and prompts.

Using ADAS (Recommended)

The simplest way to create graphs is through the ADAS API:

from synth_ai.sdk.api.train.adas import ADASJob

job = ADASJob.from_dataset(
    "my_tasks.json",
    policy_model="gpt-4o-mini",
    rollout_budget=200,
)
job.submit()
result = job.stream_until_complete()

See Workflows for the full ADAS documentation.

Using Graph GEPA Directly

For more control, use the Graph Optimization client:

from synth_ai.products.graph_gepa import GraphOptimizationConfig, GraphOptimizationClient

config = GraphOptimizationConfig.from_toml("config.toml")

async with GraphOptimizationClient(backend_url, api_key) as client:
    job_id = await client.start_job(config)
    async for event in client.stream_events(job_id):
        print(event["type"])
    result = await client.get_result(job_id)

See Graph GEPA for configuration reference.

Using Graphs

Once trained, graphs can be:

1. Run in Production

Call the /graph/completions endpoint for production inference:

result = job.run_inference({"query": "What is the capital of France?"})
print(result["output"])

See Graph Inference for details.

2. Downloaded

Export the graph for local use or inspection:

graph_export = job.download_prompt()
print(graph_export)

See Downloading Graphs for details.

Graph Artifacts

When you train a graph, Synth produces:

Artifact	Description
Graph YAML	Full graph definition with nodes and prompts
Prompt snapshots	Individual prompt versions from training
Training metrics	Scores, costs, latencies per generation

Workflows (ADAS) - High-level product API
Graph GEPA - Training method reference
Graph Inference - Production serving
Downloading Graphs - Export and local use

Training Methods

SDK

CLI

Graphs Overview

Synth Graphs

What is a Graph?

Graph Types

Policy Graphs

Verifier Graphs

Verifier Graph Dataset Format

Required Dataset Structure

The `tasks[].input.trace` Field (Required)

The `gold_outputs[].output.score` Field (Required)

Optional: Event-Level Rewards

Optional: Outcome-Level Rewards

Optional: Default Rubric

Complete Verifier Dataset Example

Training a Verifier Graph

Verifier Graph Inference

Inference via cURL

Benefits of Trained Verifier Graphs

Graph Structures

Creating Graphs

Using ADAS (Recommended)

Using Graph GEPA Directly

Using Graphs

1. Run in Production

2. Downloaded

Graph Artifacts

Training Methods

SDK

CLI

​Synth Graphs

​What is a Graph?

​Graph Types

​Policy Graphs

​Verifier Graphs

​Verifier Graph Dataset Format

​Required Dataset Structure

​The tasks[].input.trace Field (Required)

​The gold_outputs[].output.score Field (Required)

​Optional: Event-Level Rewards

​Optional: Outcome-Level Rewards

​Optional: Default Rubric

​Complete Verifier Dataset Example

​Training a Verifier Graph

​Verifier Graph Inference

​Inference via cURL

​Benefits of Trained Verifier Graphs

​Graph Structures

​Creating Graphs

​Using ADAS (Recommended)

​Using Graph GEPA Directly

​Using Graphs

​1. Run in Production

​2. Downloaded

​Graph Artifacts

​Related

Synth Graphs

What is a Graph?

Graph Types

Policy Graphs

Verifier Graphs

Verifier Graph Dataset Format

Required Dataset Structure

The `tasks[].input.trace` Field (Required)

The `gold_outputs[].output.score` Field (Required)

Optional: Event-Level Rewards

Optional: Outcome-Level Rewards

Optional: Default Rubric

Complete Verifier Dataset Example

Training a Verifier Graph

Verifier Graph Inference

Inference via cURL

Benefits of Trained Verifier Graphs

Graph Structures

Creating Graphs

Using ADAS (Recommended)

Using Graph GEPA Directly

Using Graphs

1. Run in Production

2. Downloaded

Graph Artifacts

Related