GEPA: Genetic Evolution of Prompt Architectures
GEPA uses genetic algorithms with LLM-guided mutations to evolve prompt structures. It can outperform RL-based methods on prompt optimization tasks.When to Use
- Complex prompt structures
- Multi-component prompts (system + few-shot + chain-of-thought)
- Exploring diverse prompt mutations
- When you want interpretable prompt improvements
Config Reference
Top-Level Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
algorithm | string | - | Must be "gepa" |
task_app_url | string | - | URL of your task app (tunnel URL) |
task_app_api_key | string | - | Environment API key for auth |
task_app_id | string | - | Identifier for your task |
GEPA Section [prompt_learning.gepa]
| Parameter | Type | Default | Description |
|---|---|---|---|
env_name | string | "banking77" | Environment/task name |
proposer_type | string | "dspy" | Proposer type: "dspy" or "spec" |
proposer_effort | string | "LOW" | Model quality: "LOW_CONTEXT", "LOW", "MEDIUM", "HIGH" |
proposer_output_tokens | string | "FAST" | Token limit: "RAPID" (3k), "FAST" (10k), "SLOW" (25k) |
metaprompt | string | null | Custom metaprompt for mutations |
rng_seed | int | null | Random seed for reproducibility |
Rollout Config [prompt_learning.gepa.rollout]
| Parameter | Type | Default | Description |
|---|---|---|---|
budget | int | - | Total prompt evaluations allowed |
max_concurrent | int | 20 | Max concurrent rollouts |
minibatch_size | int | 8 | Batch size for evaluation |
Evaluation Config [prompt_learning.gepa.evaluation]
| Parameter | Type | Default | Description |
|---|---|---|---|
seeds | list[int] | - | Training seeds (dataset indices) |
validation_seeds | list[int] | - | Held-out validation seeds |
test_pool | list[int] | null | Final test pool seeds |
validation_pool | string | null | Pool name (e.g., "validation") |
validation_top_k | int | null | Top-K prompts to validate |
Mutation Config [prompt_learning.gepa.mutation]
| Parameter | Type | Default | Description |
|---|---|---|---|
rate | float | 0.3 | Probability of mutation per component |
llm_model | string | null | Model for generating mutations |
llm_provider | string | "groq" | Provider for mutation LLM |
llm_inference_url | string | null | Custom inference URL |
prompt | string | null | Custom mutation prompt |
Population Config [prompt_learning.gepa.population]
| Parameter | Type | Default | Description |
|---|---|---|---|
initial_size | int | 20 | Initial population size |
num_generations | int | 10 | Number of evolution generations |
children_per_generation | int | 5 | Children generated per generation |
crossover_rate | float | 0.5 | Probability of crossover |
selection_pressure | float | 1.0 | Pareto selection pressure |
patience_generations | int | 3 | Early stopping patience |
Archive Config [prompt_learning.gepa.archive]
| Parameter | Type | Default | Description |
|---|---|---|---|
size | int | 64 | Archive size |
pareto_set_size | int | 64 | Pareto set size |
pareto_eps | float | 1e-6 | Pareto epsilon |
feedback_fraction | float | 0.5 | Fraction of archive for feedback |
Token Config [prompt_learning.gepa.token]
| Parameter | Type | Default | Description |
|---|---|---|---|
max_limit | int | null | Maximum tokens in prompt |
counting_model | string | "gpt-4" | Model for token counting |
enforce_pattern_limit | bool | true | Enforce token limits |
max_spend_usd | float | null | Maximum spend in USD |
Policy Config [prompt_learning.policy]
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | - | Model name (e.g., "gpt-4o-mini") |
provider | string | - | Provider: "openai", "groq", "synth" |
temperature | float | 0.0 | Sampling temperature |
max_completion_tokens | int | 512 | Max output tokens |
inference_mode | string | "synth_hosted" | Inference mode |
Returns
Results Structure
Download via CLI
Related
- MIPRO — Alternative prompt optimization method
- Artifacts CLI — Download prompts
- Prompt Optimization SDK — Job events reference