GEPA In-Process Task App

How can you use GEPA in-process for prod applications?

GEPA In-Process allows you to run prompt optimization entirely from a single Python script. You provide your task app and dataset, and the optimizer handles everything:

Automatic task app startup: Your FastAPI task app runs in a background thread
Cloudflare tunnel: Automatically exposes your local task app to the optimizer backend
Dataset-driven optimization: GEPA tests candidate prompts against your dataset via message passing
Clean shutdown: Everything cleans up automatically when done and you’re left with optimized prompts and their eval scores

This allows you to run GEPA programmatically over arbitrarily many datasets and tasks in production.

In-Process Task App Architecture

The in-process approach eliminates manual process management by running everything in a single Python script:

Components

Task App: Heart Disease classification using buio/heart-disease dataset

Binary classification: 0 (no disease) or 1 (heart disease)
Tool-based: Model calls heart_disease_classify function
Patient features provided as text input

Runner Script: run_fully_in_process.py

from synth_ai.task import InProcessTaskApp

async with InProcessTaskApp(
    task_app_path=task_app_path,
    port=8114,
    api_key=task_app_api_key,
) as task_app:
    # task_app.url contains the Cloudflare tunnel URL
    # Use it for GEPA jobs
    job = PromptLearningJob.from_config(
        config_path=config_path,
        task_app_url=task_app.url,
    )
    results = await job.poll_until_complete()
# Everything cleaned up automatically

What Happens:

Task app starts in background thread (uvicorn)
Cloudflare tunnel opens automatically
Backend receives public tunnel URL
GEPA job runs rollouts against tunnel
Cleanup happens automatically on exit

Seed Pools

GEPA uses different seed pools for different phases:

[prompt_learning.gepa.evaluation]
train_seeds = [0, 1, 2, ..., 29]      # 30 seeds for training
val_seeds = [30, 31, 32, ..., 79]     # 50 seeds for validation
validation_pool = "train"
validation_top_k = 2

Train seeds: Used during evolutionary process to evaluate fitness
Val seeds: Held-out validation set for final top-K selection
validation_pool: Which pool to use for validation (“train” or “val”)
validation_top_k: Number of top candidates to validate

Rollout Configuration

[prompt_learning.gepa.rollout]
budget = 300              # Total rollouts across all generations
max_concurrent = 5        # Parallel rollout limit

Budget is distributed across generations:

Initial population: initial_size × len(train_seeds) rollouts
Each generation: children_per_generation × len(train_seeds) rollouts
Archive candidates re-evaluated periodically

Meta-Model vs Policy Model

Policy Model: The model being optimized (e.g., llama-3.1-8b-instant)
- Runs your actual task (heart disease classification)
- Needs to be fast and cost-effective
- Defined in [prompt_learning.policy]
Meta-Model: The mutation generator (e.g., llama-3.3-70b-versatile)
- Analyzes successful/failing prompts and proposes mutations
- Should be more capable than policy model
- Defined in [prompt_learning.gepa.mutation]

Termination Conditions

[prompt_learning.termination_config]
max_cost_usd = 3.0       # Budget limit
max_trials = 600         # Maximum rollouts

GEPA stops when either condition is met:

Cost exceeds max_cost_usd
Total rollouts exceed max_trials

Example: Heart Disease Classification

The in-process demo uses medical classification: Task: Predict heart disease from patient features Dataset: buio/heart-disease (270 samples, train split only) Metric: Binary classification accuracy Budget: 50 rollouts (reduced from 300 for faster demo) Seed Prompt (baseline):

You are a medical classification assistant. Based on the patient's
features, classify whether they have heart disease. Respond with
'1' for heart disease or '0' for no heart disease.

GEPA’s Optimized Prompt (GPT-4.1 Mini):

You are a medical classification assistant. Your task is to analyze patient features and determine the presence of heart disease.

Input: You will receive patient features including age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise, slope of peak exercise ST segment, number of major vessels colored by fluoroscopy, and thalassemia type.

Classification Process:
• Carefully examine all provided patient features. Pay particular attention to combinations of risk factors rather than isolated values.
• Consider the relationships between features: high cholesterol combined with high blood pressure and chest pain indicates higher risk than any single factor alone.
• Exercise-related features (maximum heart rate, exercise-induced angina, ST depression) are strong indicators when present alongside other cardiovascular risk factors.
• Age and sex are baseline factors that modify risk interpretation but should not be the sole basis for classification.

Key Risk Indicators:
• Chest pain types associated with cardiovascular issues (especially when combined with other symptoms)
• Elevated resting blood pressure (>140 mmHg) or serum cholesterol (>240 mg/dL)
• Abnormal resting ECG results
• Exercise-induced angina or significant ST depression during exercise
• Multiple major vessels affected (visible via fluoroscopy)
• Thalassemia types associated with cardiovascular complications

Output Format:
• Analyze the feature combination holistically
• Respond with exactly '1' if heart disease is present based on the feature analysis
• Respond with exactly '0' if heart disease is not present
• Base your decision on the overall pattern of risk factors, not individual feature values in isolation

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

GEPA In-Process Task App

How can you use GEPA in-process for prod applications?

In-Process Task App Architecture

Components

Seed Pools

Rollout Configuration

Meta-Model vs Policy Model

Termination Conditions

Example: Heart Disease Classification

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

​How can you use GEPA in-process for prod applications?

​In-Process Task App Architecture

​Components

​Seed Pools

​Rollout Configuration

​Meta-Model vs Policy Model

​Termination Conditions

​Example: Heart Disease Classification

How can you use GEPA in-process for prod applications?

In-Process Task App Architecture

Components

Seed Pools

Rollout Configuration

Meta-Model vs Policy Model

Termination Conditions

Example: Heart Disease Classification