Skip to main content
This walkthrough demonstrates how to run GEPA optimization for Banking77 entirely in-process - no separate terminals or manual process management needed! Everything runs from a single Python script.

Prerequisites

  • GROQ_API_KEY in .env (for policy model inference)
  • SYNTH_API_KEY in .env (for backend authentication)
  • ENVIRONMENT_API_KEY in .env (optional - will be auto-generated if not set)
  • uv installed (for running Python commands)

Quick Start

Run the script from anywhere:
uv run python /path/to/synth-ai/walkthroughs/gepa/in_process/run.py
View the script: run.py That’s it! The script handles everything automatically.

What Happens Automatically

The script performs these steps without any manual intervention:
  1. Auto-generates ENVIRONMENT_API_KEY if not set (and registers it with backend)
  2. Starts the Banking77 task app in-process (no separate terminal needed)
  3. Automatically creates a Cloudflare tunnel if backend is remote (or uses localhost if backend is local)
  4. Submits a GEPA optimization job to the backend
  5. Polls for completion and displays results
  6. Cleans up everything automatically when done

Step-by-Step Execution

Initialization

What you’ll see:
================================================================================
In-Process GEPA Optimization: Banking77
================================================================================

ℹ️  ENVIRONMENT_API_KEY not set, generating and registering with backend...
✅ ENVIRONMENT_API_KEY generated and registered: sk_env_30c78a78...

ℹ️  Configuration: tunnel/tunnel
   Backend: https://agent-learning.onrender.com
   Task App: will create its own tunnel

Configuration:
  Config: banking77_gepa.toml
  Backend: https://agent-learning.onrender.com
  Task App: Starting in-process...
What happens: The script checks for required environment variables, generates an API key if needed, and determines whether to use tunnels or localhost based on the backend URL.

Task App Startup

What you’ll see:
✅ Task app running at: https://criteria-chains-incomplete-others.trycloudflare.com
✅ Cloudflare tunnel active

================================================================================
Running GEPA Optimization
================================================================================

📊 Running 5 generations with 4 children per generation
   (Total: 20 prompt candidates)

📊 Rollout budget: 200

Task app URL: https://criteria-chains-incomplete-others.trycloudflare.com
Backend URL: https://agent-learning.onrender.com

Writing config to temp file...
✅ Config written to: /tmp/tmpXXXXXX.toml
What happens: The task app starts in a background thread, a Cloudflare tunnel is created automatically, and the config is prepared with the tunnel URL.

Job Submission

What you’ll see:
Creating job from config...

✅ Job created successfully

Submitting job...

✅ Job submitted: pl_845a0e4a3628485b
What happens: The GEPA job is created from the config and submitted to the backend. You get a job ID for tracking.

Training Progress

What you’ll see - Task app processing rollouts:
[TASK_APP] INBOUND_ROLLOUT: run_id=prompt-learning-74-5bec8a6f seed=74 env=banking77
[TASK_APP] PROXY ROUTING with API key: sk_env_30c78...f263 (len=39)
[TASK_APP] OUTBOUND: model=llama-3.1-8b-instant temp=0.0 max=512 tools=1
[TASK_APP] RESPONSE_STATUS: 200
[TASK_APP] PREDICTION: expected=card_arrival predicted=card_delivery_estimate correct=False
[BANKING77_ROLLOUT] run_id=prompt-learning-74-5bec8a6f reward=0.0
INFO:     74.220.49.253:0 - "POST /rollout HTTP/1.1" 200 OK
What happens: The backend sends rollout requests to your task app. Each request contains a customer query, and the task app returns a prediction. You’ll see both correct and incorrect predictions as the optimization progresses. Progress streaming:
================================================================================
Streaming Results
================================================================================

[18:35:37]    0.0s  Status: running
[18:35:42]    5.2s  Status: running | Best: 0.500
[18:35:48]   11.4s  Status: running | Best: 0.625
[18:35:54]   17.6s  Status: running | Best: 0.750
[18:36:00]   23.8s  Status: running | Best: 0.875
[18:38:37]  162.5s  Status: running | Best: 0.875
[18:38:43]  168.7s  Status: running | Best: 0.875
[18:38:50]  175.9s  Status: succeeded | Best: 0.875
What happens: The script polls the backend every 5 seconds and displays:
  • Timestamp
  • Elapsed time
  • Current job status
  • Best score achieved so far

Results Display

What you’ll see:
✅ GEPA optimization complete in 175.9s

================================================================================
Results
================================================================================

Best score: 87.50%

Total candidates: 21
  Accuracy range: 0.00% - 87.50% (avg: 20.83%)
What happens: After completion, the script fetches the final job status and displays:
  • Best score achieved
  • Total number of candidates evaluated
  • Accuracy statistics (min, max, average)

Cleanup

What you’ll see:
INFO:synth_ai.task.in_process:Stopping in-process task app...
INFO:synth_ai.task.in_process:Tunnel stopped

================================================================================
✅ In-process GEPA optimization complete!
================================================================================
What happens: The script automatically:
  • Stops the task app
  • Closes the Cloudflare tunnel
  • Cleans up temporary files
  • Exits cleanly

Configuration

The script uses banking77_gepa.toml from walkthroughs/gepa/. You can modify:
  • Rollout budget: prompt_learning.gepa.rollout.budget (default: 200)
  • Number of generations: prompt_learning.gepa.population.num_generations (default: 5)
  • Children per generation: prompt_learning.gepa.population.children_per_generation (default: 4)
View the config: banking77_gepa.toml

Example Output Summary

Here’s what a successful run looks like:
✅ Job submitted: pl_845a0e4a3628485b
✅ GEPA optimization complete in 175.9s
Best score: 87.50%
Total candidates: 21
  Accuracy range: 0.00% - 87.50% (avg: 20.83%)
Key metrics:
  • Best score: 87.50% accuracy (significant improvement over baseline)
  • Total candidates: 21 prompt variations evaluated
  • Time: ~3 minutes for complete optimization
  • Cost: Typically 0.100.10-0.20 depending on rollout budget

Troubleshooting

  • “GROQ_API_KEY required”: Make sure .env file exists at repo root with GROQ_API_KEY set
  • “SYNTH_API_KEY required”: Make sure .env file exists at repo root with SYNTH_API_KEY set
  • Task app not found: Ensure walkthroughs/gepa/task_app/banking77_task_app.py exists
  • Config file not found: Ensure walkthroughs/gepa/banking77_gepa.toml exists
  • ENVIRONMENT_API_KEY registration failed: The script will continue with the generated key, but backend may not be able to authenticate task app requests. Check that SYNTH_API_KEY is valid.

Advantages Over Deployed Approach

  1. Single command: Everything runs from one script
  2. No manual process management: Task app and tunnel are managed automatically
  3. Automatic cleanup: Everything stops cleanly when done
  4. Better for automation: Perfect for CI/CD or batch processing
  5. Easier debugging: All logs in one place

Next Steps

  • Review the optimized prompts in the job results
  • Adjust configuration parameters for different optimization runs
  • Try the deployed walkthrough for more manual control
  • Integrate into your own scripts using the InProcessTaskApp class