Prerequisites
SYNTH_API_KEYin.env(for backend authentication)GROQ_API_KEYin.env(for policy model inference)uvinstalled (for running Python commands)cloudflaredbinary (will be auto-installed if missing)
Quick Start
Run the interactive script from the synth-ai repository:Step-by-Step Walkthrough
Step 1: Generate ENVIRONMENT_API_KEY
What happens: The script generates a new API key for authenticating with the task app and registers it with the backend. Command executed:Step 2: Deploy Cloudflare Tunnel
What happens: The script kills any existing processes on port 8102, then starts the Banking77 task app locally and creates a Cloudflare tunnel to expose it publicly. Commands executed:/tmp/gepa_walkthrough/cli_env.txt. You’ll see Cloudflare tunnel logs indicating the tunnel is ready.
Task app logs (example):
Step 3: Extract Tunnel URL
What happens: The script reads theTASK_APP_URL that was written by the deploy command.
Command executed:
Step 4: Create GEPA Config
What happens: The script updates the base TOML config to use the tunnel URL and sets rollout budget to 2000 (sufficient for prompt improvement). Command executed:Step 5: Run GEPA Training
What happens: The script submits the GEPA optimization job to the production backend and polls for completion. Command executed:- Real-time rollout processing logs showing individual predictions (correct/incorrect)
- Progress updates showing completion percentage
- Validation summaries with candidate scores (baseline vs optimized)
- Final job status with best score (in this example: 81.25% accuracy, up from 56.67% baseline - a +20% improvement)
- Cost breakdown: $0.14 total cost for 458 rollouts
- Throughput statistics: 85.6 rollouts/minute
- Total time: ~4.3 minutes
- Location of saved results files
- Baseline accuracy: 56.67%
- Best optimized prompt: 81.25% accuracy
- Improvement: +24.58 percentage points (+43% relative improvement)
- Top candidate: 76.67% accuracy (+20% vs baseline)
- Cost: $0.14 for complete optimization
Retrieving Optimized Prompts
After training completes, retrieve the optimized prompts using the job ID:Related Files
- commands.sh - Interactive deployment script
- banking77_task_app.py - Banking77 task app implementation
- banking77_gepa.toml - Base GEPA configuration file
- README.md - Additional documentation
Files Created During Execution
/tmp/gepa_walkthrough/cli_env.txt- Environment file with API key and tunnel URL/tmp/gepa_walkthrough/banking77_gepa_prod.toml- GEPA config with tunnel URL (generated from base config)/tmp/gepa_walkthrough/results/- Results directory with logs and outputs
Troubleshooting
- Port 8102 in use: The script automatically kills existing processes, but if issues persist, manually kill them:
lsof -ti :8102 | xargs kill -9 - Tunnel fails: Check that
cloudflaredis installed and network connectivity is working. The script waits 25 seconds for tunnel establishment. - API key errors: Ensure
SYNTH_API_KEYis set in your.envfile - Job fails with trace registration error: This is a known backend issue. The script completes successfully, but the job may fail during execution. Check backend logs for details.
Next Steps
- Review the optimized prompts in the results file
- Compare different candidates’ performance
- Adjust the rollout budget or number of generations in the config for different optimization runs
- Try the in-process walkthrough for a fully automated approach