Skip to main content

Graph GEPA: Graph Evolution for Prompt Architectures

Graph GEPA extends GEPA’s evolutionary approach from single prompts to multi-node graph structures. It simultaneously optimizes:
  • Graph topology - Which nodes exist and how they connect
  • Node prompts - The prompt template in each LLM node
  • Model selection - Which models to use in each node
Graph GEPA is the optimization engine behind Workflows (ADAS). For most use cases, use ADAS directly.

When to Use

Use CaseRecommendation
Simple dataset optimizationUse ADAS
Custom graph constraintsUse Graph GEPA directly
Multi-objective optimizationUse Graph GEPA with Pareto config
Warm-starting from existing graphUse Graph GEPA with initial_graph_id

Config Reference

[graph_optimization]
algorithm = "graph_gepa"
dataset_name = "my_qa_dataset"

# Graph configuration
graph_type = "policy"           # "policy" or "verifier"
graph_structure = "dag"         # "single_prompt", "dag", or "conditional"
topology_guidance = "Use chain-of-thought reasoning before answering"

# Models the graph can use
allowed_policy_models = ["gpt-4o-mini", "gpt-4o"]

# Scoring
scoring_strategy = "rubric"     # "rubric", "mae", or "default"
judge_model = "gpt-4o-mini"

# Constraints
max_llm_calls_per_run = 3       # Limit graph complexity

[graph_optimization.evolution]
num_generations = 5
children_per_generation = 3

[graph_optimization.proposer]
model = "gpt-4.1"
temperature = 0.7
max_tokens = 4096

[graph_optimization.seeds]
train = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
validation = [100, 101, 102, 103, 104]

[graph_optimization.limits]
max_spend_usd = 10.0
timeout_seconds = 3600

# Optional: Multi-objective Pareto optimization
[graph_optimization.pareto_floors]
use_latency = true
use_cost = true
latency_s = 2.0           # Don't discriminate below 2s
cost_usd = 0.10           # Don't discriminate below $0.10/seed
max_latency_s = 10.0      # Disqualify if >10s
max_cost_usd = 1.0        # Disqualify if >$1/seed

Top-Level Parameters

ParameterTypeDefaultDescription
algorithmstring"graph_gepa"Must be "graph_gepa"
dataset_namestringrequiredDataset identifier
graph_typestring"policy""policy" (solve tasks) or "verifier" (judge results)
graph_structurestring"dag"Complexity: "single_prompt", "dag", "conditional"
topology_guidancestringnullNatural language guidance for graph structure
initial_graph_idstringnullWarm-start from existing graph
allowed_policy_modelslist["gpt-4o-mini", "gpt-4o"]Models the graph can use
scoring_strategystring"rubric"How to score outputs
judge_modelstring"gpt-4o-mini"Model for LLM judge scoring
max_llm_calls_per_runintnullMax LLM calls per graph execution

Graph Types

Policy Graphs

Map inputs to outputs. Used for tasks like:
  • Question answering
  • Classification
  • Text generation
  • Code generation
graph_type = "policy"

Verifier Graphs

Judge or score existing results. Used for:
  • Quality evaluation
  • Ranking candidates
  • Filtering outputs
  • Custom LLM judges
graph_type = "verifier"
Verifier graphs require a special dataset format with V3 traces and gold scores. See Verifier Dataset Format.

Verifier Dataset Requirements

For verifier graphs, the dataset must include:
  1. Task inputs with traces - Each task must have a trace field containing a V3 SessionTrace
  2. Gold scores - Each gold output must have a score field (float, 0-1)
  3. Optional: Event rewards - Per-event reward annotations for fine-grained training
[graph_optimization]
graph_type = "verifier"
scoring_strategy = "rubric"  # Required for verifier training

[graph_optimization.dataset]
tasks = [
    { id = "trace_001", input = { trace = { session_id = "...", session_time_steps = [...] } } },
]
gold_outputs = [
    { task_id = "trace_001", output = { score = 0.85, event_rewards = [...] } },
]

Verifier Inference

Trained verifiers accept V3 traces and rubrics at inference:
result = verifier_job.run_judge(
    session_trace={"session_id": "...", "session_time_steps": [...]},
    context={"rubric": {"outcome": {"criteria": [...]}}}
)
# Returns: {"score": 0.85, "event_rewards": [...], "reasoning": "..."}

Graph Structures

Single Prompt

One LLM call, minimal structure. Best for simple tasks.
graph_structure = "single_prompt"

DAG (Directed Acyclic Graph)

Multiple nodes in sequence. Enables:
  • Chain-of-thought reasoning
  • Multi-step decomposition
  • Intermediate processing
graph_structure = "dag"
topology_guidance = "First decompose the question, then answer each part, then synthesize"

Conditional

Full graph with branching. Enables:
  • Routing based on input type
  • Fallback paths
  • Ensemble approaches
graph_structure = "conditional"

Evolution Config [graph_optimization.evolution]

ParameterTypeDefaultDescription
num_generationsint5Evolution generations
children_per_generationint3New graphs per generation

Proposer Config [graph_optimization.proposer]

ParameterTypeDefaultDescription
modelstring"gpt-4.1"Model for proposing mutations
temperaturefloat0.7Sampling temperature (0.0-2.0)
max_tokensint4096Max tokens for proposals

Seeds Config [graph_optimization.seeds]

ParameterTypeDefaultDescription
trainlist[int][0..9]Training seed indices
validationlist[int][100..104]Validation seed indices

Limits Config [graph_optimization.limits]

ParameterTypeDefaultDescription
max_spend_usdfloat10.0Maximum budget in USD
timeout_secondsint3600Job timeout

Pareto Floors [graph_optimization.pareto_floors]

Multi-objective optimization with noise floors:
ParameterTypeDefaultDescription
use_latencybooltrueInclude latency in Pareto comparison
use_costbooltrueInclude cost in Pareto comparison
latency_sfloat2.0Ignore latency differences below this
cost_usdfloat0.10Ignore cost differences below this
max_latency_sfloatnullHard ceiling - disqualify if exceeded
max_cost_usdfloatnullHard ceiling - disqualify if exceeded
min_rewardfloatnullHard floor - disqualify if below

Inline Dataset

Instead of referencing a registered dataset, upload inline:
[graph_optimization]
dataset_name = "my_inline_dataset"

[graph_optimization.dataset]
# ADAS format
tasks = [
    { task_id = "q1", input = { question = "What is 2+2?" } },
    { task_id = "q2", input = { question = "What is 3+3?" } },
]
gold_outputs = [
    { task_id = "q1", output = { answer = "4" }, score = 1.0 },
    { task_id = "q2", output = { answer = "6" }, score = 1.0 },
]

[graph_optimization.dataset.metadata]
name = "simple_math"
task_description = "Answer basic math questions"

Python SDK

Using GraphOptimizationClient

from synth_ai.products.graph_gepa import (
    GraphOptimizationConfig,
    GraphOptimizationClient,
)

# Load config
config = GraphOptimizationConfig.from_toml("config.toml")

# Run job
async with GraphOptimizationClient(backend_url, api_key) as client:
    job_id = await client.start_job(config)

    # Stream events
    async for event in client.stream_events(job_id):
        if event["type"] == "generation_complete":
            print(f"Gen {event['data']['generation']}: {event['data']['best_score']}")
        elif event["type"] == "job_complete":
            break

    # Get result
    result = await client.get_result(job_id)
    print(f"Best score: {result['best_score']}")
    print(f"Best graph:\n{result['best_yaml']}")

Programmatic Config

from synth_ai.products.graph_gepa import (
    GraphOptimizationConfig,
    GraphType,
    GraphStructure,
    EvolutionConfig,
    SeedsConfig,
)

config = GraphOptimizationConfig(
    dataset_name="hotpotqa",
    graph_type=GraphType.POLICY,
    graph_structure=GraphStructure.DAG,
    topology_guidance="Decompose multi-hop questions before answering",
    allowed_policy_models=["gpt-4o-mini"],
    evolution=EvolutionConfig(
        num_generations=5,
        children_per_generation=3,
    ),
    seeds=SeedsConfig(
        train=list(range(20)),
        validation=list(range(100, 110)),
    ),
    max_llm_calls_per_run=3,
)

Event Types

When streaming, you’ll receive these events:
Event TypeDescription
job_startedJob has begun
generation_startedNew evolution generation
candidate_evaluatedA graph variant was scored
generation_completeGeneration finished with best scores
frontier_updatedPareto frontier changed
job_completeOptimization finished
job_failedJob encountered an error

Result Structure

{
    "job_id": "graph_gepa_abc123",
    "status": "completed",
    "best_score": 0.87,
    "best_graph_snapshot_id": "snap_xyz789",
    "best_yaml": "nodes:\n  - id: main\n    ...",
    "pareto_frontier": [
        {"score": 0.87, "latency": 1.2, "cost": 0.05},
        {"score": 0.85, "latency": 0.8, "cost": 0.03},
    ],
    "generations_completed": 5,
    "total_evaluations": 150,
}