In-Process Task App Walkthrough

This walkthrough demonstrates how to run GEPA optimization for Banking77 entirely in-process - no separate terminals or manual process management needed! Everything runs from a single Python script.

Prerequisites

GROQ_API_KEY in .env (for policy model inference)
SYNTH_API_KEY in .env (for backend authentication)
ENVIRONMENT_API_KEY in .env (optional - will be auto-generated if not set)
uv installed (for running Python commands)

Quick Start

Run the script from anywhere:

uv run python /path/to/synth-ai/walkthroughs/gepa/in_process/run.py

View the script: run.py That’s it! The script handles everything automatically.

What Happens Automatically

The script performs these steps without any manual intervention:

Auto-generates ENVIRONMENT_API_KEY if not set (and registers it with backend)
Starts the Banking77 task app in-process (no separate terminal needed)
Automatically creates a Cloudflare tunnel if backend is remote (or uses localhost if backend is local)
Submits a GEPA optimization job to the backend
Polls for completion and displays results
Cleans up everything automatically when done

Step-by-Step Execution

Initialization

What you’ll see:

================================================================================
In-Process GEPA Optimization: Banking77
================================================================================

ℹ️  ENVIRONMENT_API_KEY not set, generating and registering with backend...
✅ ENVIRONMENT_API_KEY generated and registered: sk_env_30c78a78...

ℹ️  Configuration: tunnel/tunnel
   Backend: https://agent-learning.onrender.com
   Task App: will create its own tunnel

Configuration:
  Config: banking77_gepa.toml
  Backend: https://agent-learning.onrender.com
  Task App: Starting in-process...

What happens: The script checks for required environment variables, generates an API key if needed, and determines whether to use tunnels or localhost based on the backend URL.

Task App Startup

What you’ll see:

✅ Task app running at: https://criteria-chains-incomplete-others.trycloudflare.com
✅ Cloudflare tunnel active

================================================================================
Running GEPA Optimization
================================================================================

📊 Running 5 generations with 4 children per generation
   (Total: 20 prompt candidates)

📊 Rollout budget: 200

Task app URL: https://criteria-chains-incomplete-others.trycloudflare.com
Backend URL: https://agent-learning.onrender.com

Writing config to temp file...
✅ Config written to: /tmp/tmpXXXXXX.toml

What happens: The task app starts in a background thread, a Cloudflare tunnel is created automatically, and the config is prepared with the tunnel URL.

Job Submission

What you’ll see:

Creating job from config...

✅ Job created successfully

Submitting job...

✅ Job submitted: pl_845a0e4a3628485b

What happens: The GEPA job is created from the config and submitted to the backend. You get a job ID for tracking.

Training Progress

What you’ll see - Task app processing rollouts:

[TASK_APP] INBOUND_ROLLOUT: run_id=prompt-learning-74-5bec8a6f seed=74 env=banking77
[TASK_APP] PROXY ROUTING with API key: sk_env_30c78...f263 (len=39)
[TASK_APP] OUTBOUND: model=llama-3.1-8b-instant temp=0.0 max=512 tools=1
[TASK_APP] RESPONSE_STATUS: 200
[TASK_APP] PREDICTION: expected=card_arrival predicted=card_delivery_estimate correct=False
[BANKING77_ROLLOUT] run_id=prompt-learning-74-5bec8a6f reward=0.0
INFO:     74.220.49.253:0 - "POST /rollout HTTP/1.1" 200 OK

What happens: The backend sends rollout requests to your task app. Each request contains a customer query, and the task app returns a prediction. You’ll see both correct and incorrect predictions as the optimization progresses. Progress streaming:

================================================================================
Streaming Results
================================================================================

[18:35:37]    0.0s  Status: running
[18:35:42]    5.2s  Status: running | Best: 0.500
[18:35:48]   11.4s  Status: running | Best: 0.625
[18:35:54]   17.6s  Status: running | Best: 0.750
[18:36:00]   23.8s  Status: running | Best: 0.875
[18:38:37]  162.5s  Status: running | Best: 0.875
[18:38:43]  168.7s  Status: running | Best: 0.875
[18:38:50]  175.9s  Status: succeeded | Best: 0.875

What happens: The script polls the backend every 5 seconds and displays:

Timestamp
Elapsed time
Current job status
Best score achieved so far

Results Display

What you’ll see:

✅ GEPA optimization complete in 175.9s

================================================================================
Results
================================================================================

Best score: 87.50%

Total candidates: 21
  Accuracy range: 0.00% - 87.50% (avg: 20.83%)

What happens: After completion, the script fetches the final job status and displays:

Best score achieved
Total number of candidates evaluated
Accuracy statistics (min, max, average)

Cleanup

What you’ll see:

INFO:synth_ai.task.in_process:Stopping in-process task app...
INFO:synth_ai.task.in_process:Tunnel stopped

================================================================================
✅ In-process GEPA optimization complete!
================================================================================

What happens: The script automatically:

Stops the task app
Closes the Cloudflare tunnel
Cleans up temporary files
Exits cleanly

Configuration

The script uses banking77_gepa.toml from walkthroughs/gepa/. You can modify:

Rollout budget: prompt_learning.gepa.rollout.budget (default: 200)
Number of generations: prompt_learning.gepa.population.num_generations (default: 5)
Children per generation: prompt_learning.gepa.population.children_per_generation (default: 4)

View the config: banking77_gepa.toml

Example Output Summary

Here’s what a successful run looks like:

✅ Job submitted: pl_845a0e4a3628485b
✅ GEPA optimization complete in 175.9s
Best score: 87.50%
Total candidates: 21
  Accuracy range: 0.00% - 87.50% (avg: 20.83%)

Key metrics:

Best score: 87.50% accuracy (significant improvement over baseline)
Total candidates: 21 prompt variations evaluated
Time: ~3 minutes for complete optimization
Cost: Typically $0.10-$ 0.20 depending on rollout budget

Troubleshooting

“GROQ_API_KEY required”: Make sure .env file exists at repo root with GROQ_API_KEY set
“SYNTH_API_KEY required”: Make sure .env file exists at repo root with SYNTH_API_KEY set
Task app not found: Ensure walkthroughs/gepa/task_app/banking77_task_app.py exists
Config file not found: Ensure walkthroughs/gepa/banking77_gepa.toml exists
ENVIRONMENT_API_KEY registration failed: The script will continue with the generated key, but backend may not be able to authenticate task app requests. Check that SYNTH_API_KEY is valid.

Advantages Over Deployed Approach

Single command: Everything runs from one script
No manual process management: Task app and tunnel are managed automatically
Automatic cleanup: Everything stops cleanly when done
Better for automation: Perfect for CI/CD or batch processing
Easier debugging: All logs in one place

run.py - In-process GEPA optimization script
banking77_task_app.py - Banking77 task app implementation
banking77_gepa.toml - GEPA configuration file
README.md - Additional documentation

Next Steps

Review the optimized prompts in the job results
Adjust configuration parameters for different optimization runs
Try the deployed walkthrough for more manual control
Integrate into your own scripts using the InProcessTaskApp class

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

In-Process Task App Walkthrough

Prerequisites

Quick Start

What Happens Automatically

Step-by-Step Execution

Initialization

Task App Startup

Job Submission

Training Progress

Results Display

Cleanup

Configuration

Example Output Summary

Troubleshooting

Advantages Over Deployed Approach

Next Steps

Start Training

Prompt Optimization

Supervised Fine-Tuning

Reinforcement Learning

​Prerequisites

​Quick Start

​What Happens Automatically

​Step-by-Step Execution

​Initialization

​Task App Startup

​Job Submission

​Training Progress

​Results Display

​Cleanup

​Configuration

​Example Output Summary

​Troubleshooting

​Advantages Over Deployed Approach

​Related Files

​Next Steps

Prerequisites

Quick Start

What Happens Automatically

Step-by-Step Execution

Initialization

Task App Startup

Job Submission

Training Progress

Results Display

Cleanup

Configuration

Example Output Summary

Troubleshooting

Advantages Over Deployed Approach

Related Files

Next Steps