Algorithms

Overview

Synth supports two state-of-the-art prompt optimization algorithms:

GEPA (Genetic Evolution of Prompt Architectures) - Population-based evolutionary search
MIPRO (Meta-Instruction PROposer) - Meta-learning with Bayesian optimization

Both algorithms use an interceptor pattern that ensures optimized prompts never reach task apps. All prompt modifications happen in the backend via an inference interceptor.

References

GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.” arXiv:2507.19457
MIPRO: Opsahl-Ong et al. (2024). “Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs.” arXiv:2406.11695

GEPA (Genetic Evolution of Prompt Architectures)

Reference: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.” arXiv:2507.19457 GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts. It also outperforms MIPROv2 by over 10% across two LLMs.

How It Works

GEPA uses evolutionary principles inspired by genetic algorithms:

Population Initialization
- Starts with baseline prompt + random mutations
- Creates initial population of 20-30 prompt variants
Evaluation
- Evaluates each prompt variant on training seeds
- Tracks multiple objectives: accuracy, token count, tool call rate
Selection (Pareto Front)
- Maintains non-dominated solutions
- Balances performance vs. prompt length
- Keeps top-K solutions in Pareto archive
Variation
- Mutation: LLM-guided or regex-based prompt modifications
- Crossover: Combines two parent prompts to create offspring
Evolution Loop
- Repeats for 10-15 generations
- Population evolves toward better solutions

Key Features

Pareto Optimization: Maintains diverse solutions balancing multiple objectives
LLM-Guided Mutations: Uses mutation models (e.g., gpt-oss-120b) for intelligent modifications
Pattern Mode: Supports transformation-based mutations for systematic changes
Multi-Stage Support: Module-aware evolution for pipeline optimization
Reflective Feedback: Analyzes execution traces to guide mutations

Configuration Example

[prompt_learning]
algorithm = "gepa"
task_app_url = "http://127.0.0.1:8102"

[prompt_learning.gepa]
initial_population_size = 20
num_generations = 15
mutation_rate = 0.3
crossover_rate = 0.5
rollout_budget = 1000
max_concurrent_rollouts = 20
pareto_set_size = 20

[prompt_learning.gepa.mutation]
llm_model = "openai/gpt-oss-120b"
llm_provider = "groq"

Typical Results

Baseline: 60-75% accuracy
After 5 generations: 75-80% accuracy
After 10 generations: 80-85% accuracy
After 15 generations: 85-90%+ accuracy

Best For

Classification tasks (Banking77, intent classification)
Multi-hop QA (HotpotQA)
Tasks requiring diverse prompt variants
Large evaluation budgets (1000+ rollouts)

MIPRO (Meta-Instruction PROposer)

Reference: Opsahl-Ong et al. (2024). “Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs.” arXiv:2406.11695 MIPRO outperforms baseline optimizers on five of seven diverse multi-stage LM programs using Llama-3-8B, by as high as 13% accuracy.

How It Works

MIPRO uses meta-learning to propose better instructions:

Bootstrap Phase
- Evaluates baseline prompt on bootstrap seeds
- Collects high-scoring examples (score >= threshold)
- Generates few-shot demonstrations
- Initializes meta-model with task-specific context
Instruction Generation
- Meta-LLM (e.g., GPT-4o-mini) proposes instruction variants
- Uses few-shot examples, reference corpus (50k tokens), system specs
- Generates additive guidance (not rewrites)
TPE-Guided Search
- Tree-structured Parzen Estimator suggests candidates
- Evaluates proposals on mini-batch of seeds
- Updates TPE distribution based on results
Optimization Loop
- Repeats for 10-20 iterations
- Each iteration evaluates 4-6 prompt variants
- TPE guides search toward promising regions

Key Features

Bootstrap Phase: Starts with task-specific examples (not cold-start)
Meta-LLM Proposals: Uses GPT-4o-mini or similar for instruction generation
Reference Corpus: Injects up to 50k tokens of dataset examples
System Spec Integration: Uses JSON specifications for constraint-aware optimization
Multi-Stage Support: Per-stage instruction proposals with LCS detection
Token Budget Tracking: Monitors and enforces token limits

Configuration Example

[prompt_learning]
algorithm = "mipro"
task_app_url = "https://my-task-app.modal.run"

[prompt_learning.mipro]
num_iterations = 16
num_evaluations_per_iteration = 6
batch_size = 6
max_concurrent = 20

# Seed pools
bootstrap_train_seeds = [0, 1, 2, 3, 4]
online_pool = [5, 6, 7, 8, 9]
test_pool = [20, 21, 22, 23, 24]
reference_pool = [50, 51, 52, ..., 149]  # Optional: for meta-prompt context

# Meta-model
meta_model = "gpt-4o-mini"
meta_model_provider = "openai"
few_shot_score_threshold = 0.85

Typical Results

Bootstrap Phase: Collects 3-5 high-scoring examples
After 8 iterations: ~80-85% accuracy
After 16 iterations: ~85-90% accuracy (similar to GEPA)
Advantage: Achieves similar results with ~96 rollouts vs. ~1000 for GEPA

Best For

Tasks with clear structure (can bootstrap with examples)
Efficient optimization (fewer evaluations needed)
Token budget constraints
Task-specific improvements

Detailed Comparison

Aspect	GEPA	MIPRO
Search Method	Genetic evolution (mutation + crossover)	Meta-LLM proposals + TPE
Initialization	Random population (20-30 variants)	Bootstrap phase (few-shot examples)
Exploration	Broad, diverse variants	Focused, efficient search
Guidance	Pareto optimization	Bayesian optimization (TPE)
Mutations	LLM-guided or regex-based	Meta-model proposals
Evaluation	Full evaluation on 30 seeds	Mini-batch on 5 seeds per iteration
Computational Cost	Lower (fewer LLM calls)	Higher (meta-model calls)
Convergence	10-15 generations	10-20 iterations
Total Evaluations	~1000 rollouts	~96 rollouts
Best For	Broad exploration	Task-specific optimization
Pareto Front	✅ Yes (diverse solutions)	❌ No (single best solution)
Multi-Stage	✅ Yes (module-aware)	✅ Yes (per-stage proposals)

Architecture: Inference Interception

Both algorithms use the same interceptor pattern:

✅ CORRECT FLOW:
1. Backend registers optimized prompt with interceptor
2. Task app calls LLM with clean policy config (no prompts)
3. Interceptor substitutes optimized prompt before forwarding to LLM
4. LLM receives optimized prompt, returns result
5. Task app evaluates result, returns score

❌ WRONG FLOW (NEVER DO THIS):
Backend → prompt_template in payload → Task App

Key Benefits:

Task apps remain unchanged during optimization
Prompt optimization logic stays in backend
Secure, correct prompt substitution
No prompt leakage to task apps

Model Requirements

Policy Models (Both Algorithms)

OpenAI: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, gpt-5-nano
Groq: gpt-oss-20b, gpt-oss-120b, llama-3.3-70b-versatile, qwen-32b, qwen3-32b
Google: gemini-2.5-pro, gemini-2.5-pro-gt200k, gemini-2.5-flash, gemini-2.5-flash-lite

Mutation Models (GEPA Only)

Common: openai/gpt-oss-120b, llama-3.3-70b-versatile
Nano models rejected (too small for generation)

Meta Models (MIPRO Only)

Common: gpt-4o-mini (most common default), gpt-4.1-mini
Nano models rejected (too small for generation)

Note: gpt-5-pro is explicitly rejected for all model types (too expensive)

Multi-Stage Pipeline Support

Both algorithms support optimizing prompts for multi-stage pipelines:

GEPA Multi-Stage

Module-aware evolution: Each pipeline module gets its own gene
Module selection: Mutations target specific modules
Uniform crossover: Combines parent genes per module
Aggregated scoring: Sum of module lengths for Pareto optimization

MIPRO Multi-Stage

Per-stage proposals: Meta-LLM generates instructions for each stage
LCS detection: Automatically identifies which stage is being called
Stage-specific meta-prompts: Includes pipeline overview, stage role, baseline
Unified evaluation: Tracks end-to-end performance

Choosing the Right Algorithm

Use GEPA if:

✅ You want diverse prompt variants (Pareto front)
✅ You have a large evaluation budget (1000+ rollouts)
✅ You need broad exploration of the prompt space
✅ You’re optimizing classification or multi-hop QA tasks

Use MIPRO if:

✅ You want faster convergence with fewer evaluations
✅ You have clear task structure (can bootstrap with examples)
✅ You need efficient optimization (mini-batch evaluation)
✅ You have token budget constraints
✅ You want task-specific improvements

Next Steps

Configuration Reference – Complete parameter documentation
Training Guide – Step-by-step instructions
Banking77 Example – Complete walkthrough

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

Overview

References

GEPA (Genetic Evolution of Prompt Architectures)

How It Works

Key Features

Configuration Example

Typical Results

Best For

MIPRO (Meta-Instruction PROposer)

How It Works

Key Features

Configuration Example

Typical Results

Best For

Detailed Comparison

Architecture: Inference Interception

Model Requirements

Policy Models (Both Algorithms)

Mutation Models (GEPA Only)

Meta Models (MIPRO Only)

Multi-Stage Pipeline Support

GEPA Multi-Stage

MIPRO Multi-Stage

Choosing the Right Algorithm

Next Steps

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

​Overview

​References

​GEPA (Genetic Evolution of Prompt Architectures)

​How It Works

​Key Features

​Configuration Example

​Typical Results

​Best For

​MIPRO (Meta-Instruction PROposer)

​How It Works

​Key Features

​Configuration Example

​Typical Results

​Best For

​Detailed Comparison

​Architecture: Inference Interception

​Model Requirements

​Policy Models (Both Algorithms)

​Mutation Models (GEPA Only)

​Meta Models (MIPRO Only)

​Multi-Stage Pipeline Support

​GEPA Multi-Stage

​MIPRO Multi-Stage

​Choosing the Right Algorithm

​Next Steps

Overview

References

GEPA (Genetic Evolution of Prompt Architectures)

How It Works

Key Features

Configuration Example

Typical Results

Best For

MIPRO (Meta-Instruction PROposer)

How It Works

Key Features

Configuration Example

Typical Results

Best For

Detailed Comparison

Architecture: Inference Interception

Model Requirements

Policy Models (Both Algorithms)

Mutation Models (GEPA Only)

Meta Models (MIPRO Only)

Multi-Stage Pipeline Support

GEPA Multi-Stage

MIPRO Multi-Stage

Choosing the Right Algorithm

Next Steps