Skip to main content
Prompt optimization uses evolutionary algorithms to automatically improve prompts for classification, reasoning, and instruction-following tasks. Works with any language – build Container in Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP. See Polyglot Container for examples and the OpenAPI contract. Synth AI uses GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution.” arXiv:2507.19457

1. Build a prompt evaluation Container

Use the ContainerConfig interface to describe dataset splits, rubrics, and rollout handlers. Build in any language – implement the OpenAPI contract in your preferred language. → Create a prompt evaluation Container | Polyglot examples

2. Author the prompt optimization config

Capture the GEPA algorithm choice, initial prompt template, training/validation seeds, and optimization parameters in TOML. → Read: Prompt optimization configs

3. Query and evaluate results

Use the Python API or REST endpoints to retrieve optimized prompts and evaluate them on held-out validation sets.
→ Read: Querying results

Algorithm Overview

GEPA (Genetic Evolution of Prompt Architectures)

Best for: Broad exploration, diverse prompt variants, classification tasks
Reference: Agrawal et al. (2025)
GEPA uses evolutionary principles to explore the prompt space:
  • Population-based search with multiple prompt variants
  • LLM-guided mutations for intelligent prompt modifications
  • Pareto optimization balancing performance and prompt length
  • Multi-stage support for pipeline optimization
Typical results: Improves accuracy from 60-75% (baseline) to 85-90%+ over 15 generations Key features:
  • Maintains a Pareto front of non-dominated solutions
  • Supports both template mode and pattern-based transformations
  • Module-aware evolution for multi-stage pipelines
  • Reflective feedback from execution traces
  • Hosted verifier integration for quality-aware optimization

Architecture: Inference Interception

GEPA does call your container’s /rollout endpoint — but optimized prompts never appear in the rollout payload. Instead, the backend registers each candidate with an inference interceptor and passes your container a policy_config.inference_url. When your container makes LLM calls through that URL, the interceptor substitutes the candidate prompt before forwarding to the model.
GEPA evaluation flow:

Backend ──proposes candidate──▶ Interceptor (registers prompt)
Backend ──/rollout──▶ Container
Container ──LLM call via inference_url──▶ Interceptor ──substitutes prompt──▶ LLM
Container ◀──response──────────────────── LLM
Backend  ◀──metrics/reward────────────── Container
This separation ensures:
  • No prompt leakage: your container never sees the optimized prompt text
  • Containers remain unchanged: just route LLM calls through policy_config.inference_url
  • Traces captured: the interceptor records execution traces for reflective feedback
  • Stored artifacts: traces and artifacts can be reused for reflection across generations

Production-Ready: Works with Your Code

GEPA works with your production code via HTTP-based serverless endpoints. Build Container in any language (Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP). See Polyglot Container for examples and the OpenAPI contract.

Supported Models

See Supported Models for Prompt Optimization for the full list of policy models.

Multi-Stage Pipeline Support

GEPA supports optimizing prompts for multi-stage pipelines (e.g., Banking77 classifier → calibrator):
  • LCS-based stage detection automatically identifies which stage is being called
  • Per-stage optimization evolves separate instructions for each pipeline module
  • Unified evaluation tracks end-to-end performance across all stages

Next Steps