Skip to main content
After a prompt optimization job completes, use the Python SDK helpers (the only supported interface) to fetch optimized prompts, inspect Pareto fronts, and re-run validation seeds against your Modal task app.

Querying Results (Python SDK)

import os
from synth_ai.learning import get_prompts, get_prompt_text, get_scoring_summary

BASE_URL = os.environ.get("BACKEND_BASE_URL", "https://agent-learning.onrender.com/api").rstrip("/")
API_KEY = os.environ["SYNTH_API_KEY"]
JOB_ID = "pl_abc123"  # Replace with the job id printed by `uvx synth-ai train`

results = get_prompts(job_id=JOB_ID, base_url=BASE_URL, api_key=API_KEY)

print(f"Best Score: {results['best_score']:.3f}")
for prompt in results["top_prompts"][:5]:
    print(f"Rank {prompt['rank']} Train={prompt['train_accuracy']:.3f}")

best_prompt = get_prompt_text(job_id=JOB_ID, base_url=BASE_URL, api_key=API_KEY, rank=1)
summary = get_scoring_summary(job_id=JOB_ID, base_url=BASE_URL, api_key=API_KEY)

print(best_prompt)
print(
    f"Train={summary['best_train_accuracy']:.3f} "
    f"Validation={summary.get('best_validation_accuracy', 0.0):.3f}"
)
print(f"Candidates Tried={summary['num_candidates_tried']}")

Understanding Results

Score Types

Prompt learning jobs track two types of scores:
  • prompt_best_train_score: Best accuracy on training seeds (used during optimization)
  • prompt_best_validation_score: Best accuracy on validation seeds (held-out evaluation)
The validation score provides an unbiased estimate of generalization performance.

Pareto Front

GEPA maintains a Pareto front of prompt variants balancing:
  • Accuracy (primary objective) – Task performance
  • Token count (efficiency objective) – Prompt length
  • Tool call rate (task-specific objective) – Function calling frequency
Query multiple ranks to explore the trade-offs:
# Get top 5 prompts from Pareto front
for rank in range(1, 6):
    prompt = get_prompt_text(job_id="pl_abc123", rank=rank)
    print(f"Rank {rank}: {len(prompt)} tokens")

Validation Evaluation

After optimization, run held-out seeds against your Modal task app using the same Python helpers. You can either:
  • Call uvx synth-ai eval <app_id> --url "$TASK_APP_URL" --seeds ... (CLI wraps the task app for you), or
  • Send rollouts directly to "$TASK_APP_URL/rollout" using the optimized prompt text you retrieved.

Expected Performance

GEPA typically improves accuracy over generations:
GenerationTypical AccuracyNotes
1 (baseline)60-75%Initial random/baseline prompts
575-80%Early optimization gains
1080-85%Convergence begins
15 (final)85-90%+Optimized prompts on Pareto front

Next Steps