Skip to main content

GEPA: Genetic Evolution of Prompt Architectures

GEPA uses genetic algorithms with LLM-guided mutations to evolve prompt structures. It can outperform RL-based methods on prompt optimization tasks.

When to Use

  • Complex prompt structures
  • Multi-component prompts (system + few-shot + chain-of-thought)
  • Exploring diverse prompt mutations
  • When you want interpretable prompt improvements

Config Reference

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://your-tunnel.trycloudflare.com"
task_app_api_key = "$ENVIRONMENT_API_KEY"
task_app_id = "your-task"

[prompt_learning.initial_prompt]
id = "my_prompt"
name = "My Classification Prompt"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are a classifier. {instructions}"
order = 0

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "{query}"
order = 1

[prompt_learning.policy]
model = "gpt-4o-mini"
provider = "openai"
temperature = 0.0
max_completion_tokens = 512

[prompt_learning.gepa]
env_name = "my-task"
proposer_type = "dspy"
proposer_effort = "LOW"
proposer_output_tokens = "FAST"

[prompt_learning.gepa.rollout]
budget = 100
max_concurrent = 20
minibatch_size = 8

[prompt_learning.gepa.evaluation]
seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
validation_seeds = [10, 11, 12, 13, 14]
validation_pool = "validation"
validation_top_k = 3

[prompt_learning.gepa.mutation]
rate = 0.3
llm_model = "gpt-oss-120b"
llm_provider = "groq"

[prompt_learning.gepa.population]
initial_size = 20
num_generations = 10
children_per_generation = 5
crossover_rate = 0.5
selection_pressure = 1.0
patience_generations = 3

[prompt_learning.gepa.archive]
size = 64
pareto_set_size = 64
pareto_eps = 1e-6
feedback_fraction = 0.5

[prompt_learning.gepa.token]
counting_model = "gpt-4"
enforce_pattern_limit = true

Top-Level Parameters

ParameterTypeDefaultDescription
algorithmstring-Must be "gepa"
task_app_urlstring-URL of your task app (tunnel URL)
task_app_api_keystring-Environment API key for auth
task_app_idstring-Identifier for your task

GEPA Section [prompt_learning.gepa]

ParameterTypeDefaultDescription
env_namestring"banking77"Environment/task name
proposer_typestring"dspy"Proposer type: "dspy" or "spec"
proposer_effortstring"LOW"Model quality: "LOW_CONTEXT", "LOW", "MEDIUM", "HIGH"
proposer_output_tokensstring"FAST"Token limit: "RAPID" (3k), "FAST" (10k), "SLOW" (25k)
metapromptstringnullCustom metaprompt for mutations
rng_seedintnullRandom seed for reproducibility

Rollout Config [prompt_learning.gepa.rollout]

ParameterTypeDefaultDescription
budgetint-Total prompt evaluations allowed
max_concurrentint20Max concurrent rollouts
minibatch_sizeint8Batch size for evaluation

Evaluation Config [prompt_learning.gepa.evaluation]

ParameterTypeDefaultDescription
seedslist[int]-Training seeds (dataset indices)
validation_seedslist[int]-Held-out validation seeds
test_poollist[int]nullFinal test pool seeds
validation_poolstringnullPool name (e.g., "validation")
validation_top_kintnullTop-K prompts to validate

Mutation Config [prompt_learning.gepa.mutation]

ParameterTypeDefaultDescription
ratefloat0.3Probability of mutation per component
llm_modelstringnullModel for generating mutations
llm_providerstring"groq"Provider for mutation LLM
llm_inference_urlstringnullCustom inference URL
promptstringnullCustom mutation prompt

Population Config [prompt_learning.gepa.population]

ParameterTypeDefaultDescription
initial_sizeint20Initial population size
num_generationsint10Number of evolution generations
children_per_generationint5Children generated per generation
crossover_ratefloat0.5Probability of crossover
selection_pressurefloat1.0Pareto selection pressure
patience_generationsint3Early stopping patience

Archive Config [prompt_learning.gepa.archive]

ParameterTypeDefaultDescription
sizeint64Archive size
pareto_set_sizeint64Pareto set size
pareto_epsfloat1e-6Pareto epsilon
feedback_fractionfloat0.5Fraction of archive for feedback

Token Config [prompt_learning.gepa.token]

ParameterTypeDefaultDescription
max_limitintnullMaximum tokens in prompt
counting_modelstring"gpt-4"Model for token counting
enforce_pattern_limitbooltrueEnforce token limits
max_spend_usdfloatnullMaximum spend in USD

Policy Config [prompt_learning.policy]

ParameterTypeDefaultDescription
modelstring-Model name (e.g., "gpt-4o-mini")
providerstring-Provider: "openai", "groq", "synth"
temperaturefloat0.0Sampling temperature
max_completion_tokensint512Max output tokens
inference_modestring"synth_hosted"Inference mode

Returns

from synth_ai.sdk.api.train.prompt_learning import PromptLearningJob

job = PromptLearningJob.from_config("gepa.toml")
job.submit()
result = job.poll_until_complete()

# Get results
results = job.get_results()
print(f"Best Score: {results['best_score']}")

# Get best prompt text
best_prompt = job.get_best_prompt_text(rank=1)
print(best_prompt)

Results Structure

{
    "best_prompt": {...},     # Full prompt with sections
    "best_score": 0.85,       # Accuracy on validation
    "top_prompts": [...],     # Top K prompts by score
    "optimized_candidates": [...],  # All evolved candidates
    "attempted_candidates": [...],  # All evaluated candidates
    "validation_results": {...},    # Per-seed validation scores
}

Download via CLI

# Download best prompt as JSON
uvx synth-ai artifacts download pl_71c12c4c7c474c34

# Download as YAML
uvx synth-ai artifacts download pl_71c12c4c7c474c34 --format yaml

# Save to file
uvx synth-ai artifacts download pl_71c12c4c7c474c34 --output prompt.json