Graph GEPA: Graph Evolution for Prompt Architectures
Graph GEPA extends GEPA’s evolutionary approach from single prompts to multi-node graph structures. It simultaneously optimizes:- Graph topology - Which nodes exist and how they connect
- Node prompts - The prompt template in each LLM node
- Model selection - Which models to use in each node
Graph GEPA is the optimization engine behind Workflows (ADAS). For most use cases, use ADAS directly.
When to Use
| Use Case | Recommendation |
|---|---|
| Simple dataset optimization | Use ADAS |
| Custom graph constraints | Use Graph GEPA directly |
| Multi-objective optimization | Use Graph GEPA with Pareto config |
| Warm-starting from existing graph | Use Graph GEPA with initial_graph_id |
Config Reference
Top-Level Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
algorithm | string | "graph_gepa" | Must be "graph_gepa" |
dataset_name | string | required | Dataset identifier |
graph_type | string | "policy" | "policy" (solve tasks) or "verifier" (judge results) |
graph_structure | string | "dag" | Complexity: "single_prompt", "dag", "conditional" |
topology_guidance | string | null | Natural language guidance for graph structure |
initial_graph_id | string | null | Warm-start from existing graph |
allowed_policy_models | list | ["gpt-4o-mini", "gpt-4o"] | Models the graph can use |
scoring_strategy | string | "rubric" | How to score outputs |
judge_model | string | "gpt-4o-mini" | Model for LLM judge scoring |
max_llm_calls_per_run | int | null | Max LLM calls per graph execution |
Graph Types
Policy Graphs
Map inputs to outputs. Used for tasks like:- Question answering
- Classification
- Text generation
- Code generation
Verifier Graphs
Judge or score existing results. Used for:- Quality evaluation
- Ranking candidates
- Filtering outputs
- Custom LLM judges
Verifier Dataset Requirements
For verifier graphs, the dataset must include:- Task inputs with traces - Each task must have a
tracefield containing a V3SessionTrace - Gold scores - Each gold output must have a
scorefield (float, 0-1) - Optional: Event rewards - Per-event reward annotations for fine-grained training
Verifier Inference
Trained verifiers accept V3 traces and rubrics at inference:Graph Structures
Single Prompt
One LLM call, minimal structure. Best for simple tasks.DAG (Directed Acyclic Graph)
Multiple nodes in sequence. Enables:- Chain-of-thought reasoning
- Multi-step decomposition
- Intermediate processing
Conditional
Full graph with branching. Enables:- Routing based on input type
- Fallback paths
- Ensemble approaches
Evolution Config [graph_optimization.evolution]
| Parameter | Type | Default | Description |
|---|---|---|---|
num_generations | int | 5 | Evolution generations |
children_per_generation | int | 3 | New graphs per generation |
Proposer Config [graph_optimization.proposer]
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | "gpt-4.1" | Model for proposing mutations |
temperature | float | 0.7 | Sampling temperature (0.0-2.0) |
max_tokens | int | 4096 | Max tokens for proposals |
Seeds Config [graph_optimization.seeds]
| Parameter | Type | Default | Description |
|---|---|---|---|
train | list[int] | [0..9] | Training seed indices |
validation | list[int] | [100..104] | Validation seed indices |
Limits Config [graph_optimization.limits]
| Parameter | Type | Default | Description |
|---|---|---|---|
max_spend_usd | float | 10.0 | Maximum budget in USD |
timeout_seconds | int | 3600 | Job timeout |
Pareto Floors [graph_optimization.pareto_floors]
Multi-objective optimization with noise floors:
| Parameter | Type | Default | Description |
|---|---|---|---|
use_latency | bool | true | Include latency in Pareto comparison |
use_cost | bool | true | Include cost in Pareto comparison |
latency_s | float | 2.0 | Ignore latency differences below this |
cost_usd | float | 0.10 | Ignore cost differences below this |
max_latency_s | float | null | Hard ceiling - disqualify if exceeded |
max_cost_usd | float | null | Hard ceiling - disqualify if exceeded |
min_reward | float | null | Hard floor - disqualify if below |
Inline Dataset
Instead of referencing a registered dataset, upload inline:Python SDK
Using GraphOptimizationClient
Programmatic Config
Event Types
When streaming, you’ll receive these events:| Event Type | Description |
|---|---|
job_started | Job has begun |
generation_started | New evolution generation |
candidate_evaluated | A graph variant was scored |
generation_complete | Generation finished with best scores |
frontier_updated | Pareto frontier changed |
job_complete | Optimization finished |
job_failed | Job encountered an error |
Result Structure
Related
- ADAS / Workflows - High-level API (uses Graph GEPA)
- GEPA - Single-prompt optimization
- Graphs Overview - Graph concepts
- Graph Inference - Production serving