Skip to main content

Quickstart: Graph GEPA

This guide walks you through training a multi-node LLM graph using Graph GEPA. By the end, you’ll have an optimized graph that outperforms a single prompt.
For most use cases, we recommend using ADAS/Workflows which provides a simpler interface. Use Graph GEPA directly when you need fine-grained control over evolution parameters.

Prerequisites

  • Synth API key (get one here)
  • Python 3.11+
  • synth-ai package installed
pip install synth-ai

Step 1: Prepare Your Dataset

Create a JSON file with your tasks and expected outputs:
{
  "tasks": [
    {
      "task_id": "q1",
      "input": {
        "question": "What is the capital of France?",
        "context": "France is a country in Western Europe."
      }
    },
    {
      "task_id": "q2",
      "input": {
        "question": "Who wrote Romeo and Juliet?",
        "context": "Romeo and Juliet is a famous tragedy."
      }
    }
  ],
  "gold_outputs": [
    {
      "task_id": "q1",
      "output": { "answer": "Paris" },
      "score": 1.0
    },
    {
      "task_id": "q2",
      "output": { "answer": "William Shakespeare" },
      "score": 1.0
    }
  ],
  "metadata": {
    "name": "simple_qa",
    "task_description": "Answer questions using the provided context"
  }
}
Save this as dataset.json.

Step 2: Create Configuration

Create a TOML configuration file:
# config.toml
[graph_optimization]
algorithm = "graph_gepa"
dataset_name = "simple_qa"

# Graph settings
graph_type = "policy"
graph_structure = "dag"
topology_guidance = "First extract relevant information from context, then formulate the answer"

# Models
allowed_policy_models = ["gpt-4o-mini"]
judge_model = "gpt-4o-mini"
scoring_strategy = "rubric"

# Evolution
[graph_optimization.evolution]
num_generations = 3
children_per_generation = 2

[graph_optimization.proposer]
model = "gpt-4.1"

# Data splits
[graph_optimization.seeds]
train = [0, 1, 2, 3, 4]
validation = [5, 6, 7]

# Budget
[graph_optimization.limits]
max_spend_usd = 5.0
timeout_seconds = 1800

Step 3: Run Training

Option A: Python SDK

import asyncio
import json
import os
from synth_ai.products.graph_gepa import (
    GraphOptimizationConfig,
    GraphOptimizationClient,
)

async def main():
    # Load config
    config = GraphOptimizationConfig.from_toml("config.toml")

    # Load dataset and attach to config
    with open("dataset.json") as f:
        config.dataset = json.load(f)

    # Get credentials from environment
    backend_url = os.environ.get("SYNTH_BACKEND_URL", "https://api.usesynth.ai")
    api_key = os.environ["SYNTH_API_KEY"]

    async with GraphOptimizationClient(backend_url, api_key) as client:
        # Start job
        job_id = await client.start_job(config)
        print(f"Started job: {job_id}")

        # Stream progress
        async for event in client.stream_events(job_id):
            event_type = event.get("type")

            if event_type == "generation_complete":
                data = event.get("data", {})
                print(f"Generation {data.get('generation')}: best_score={data.get('best_score'):.3f}")

            elif event_type == "job_complete":
                print("Training complete!")
                break

            elif event_type == "job_failed":
                print(f"Job failed: {event.get('data', {}).get('error')}")
                return

        # Get results
        result = await client.get_result(job_id)
        print(f"\nBest score: {result['best_score']:.3f}")
        print(f"\nBest graph:\n{result['best_yaml']}")

asyncio.run(main())

Option B: Using ADAS (Simpler)

If you don’t need fine-grained control, use ADAS:
from synth_ai.sdk.api.train.adas import ADASJob

job = ADASJob.from_dataset(
    "dataset.json",
    policy_model="gpt-4o-mini",
    rollout_budget=100,
    proposer_effort="medium",
)
job.submit()
result = job.stream_until_complete()
print(f"Best score: {result.get('best_score')}")

Step 4: Use Your Graph

Production Inference

# Using ADAS job
output = job.run_inference({
    "question": "What is the largest planet?",
    "context": "Jupiter is the largest planet in our solar system."
})
print(output)  # {"answer": "Jupiter"}

Download for Local Use

# Get the optimized graph
graph_export = job.download_prompt()
print(graph_export)

What Happens During Training

  1. Initialization: Graph GEPA creates an initial population of graph candidates
  2. Evaluation: Each candidate is run on training seeds and scored
  3. Selection: Best candidates are selected for the next generation
  4. Mutation: LLM proposes modifications to prompts and structure
  5. Repeat: Process continues for num_generations
  6. Validation: Top candidates are evaluated on held-out validation seeds
Generation 1: best_score=0.65, candidates=5
Generation 2: best_score=0.72, candidates=5
Generation 3: best_score=0.81, candidates=5
Validation: final_score=0.79

Tips for Better Results

1. More Training Data

More examples = better optimization:
[graph_optimization.seeds]
train = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
validation = [15, 16, 17, 18, 19]

2. Topology Guidance

Help the proposer understand your task:
topology_guidance = """
For multi-hop reasoning questions:
1. First identify what information is needed
2. Extract relevant facts from context
3. Combine facts to form the answer
"""

3. Appropriate Structure

Match structure to task complexity:
TaskRecommended Structure
Simple classificationsingle_prompt
Multi-step reasoningdag
Routing/branching logicconditional

4. Budget Allocation

More generations with fewer children often beats few generations with many children:
[graph_optimization.evolution]
num_generations = 5        # More iterations
children_per_generation = 2  # Fewer variants per iteration

Troubleshooting

Low Scores

  • Add more diverse training examples
  • Increase num_generations
  • Try different topology_guidance
  • Check that gold outputs are correct

Slow Training

  • Reduce children_per_generation
  • Use faster policy model (e.g., gpt-4o-mini)
  • Reduce training seed count

High Costs

  • Set max_spend_usd limit
  • Use max_llm_calls_per_run to limit graph complexity
  • Use cheaper models in allowed_policy_models

Next Steps