Polyglot Task App: Banking77

Overview

End-to-end Banking77 intent classification Task App (Python + polyglot ports) wired for Synth prompt optimization.
Clear Task App contract, dataset/reward, and the exact label-explicit prompt we optimized.
Repro steps: run locally, tunnel, launch GEPA, and inspect the real results/logs.

What you’ll learn

How Synth talks to a Task App (/health, /task_info, /rollout) and what env.seed/policy.config mean.
How to constrain outputs to the Banking77 label set (langprobe-style prompt) and compute rewards deterministically.
How to run locally, expose via Cloudflare Tunnel, launch GEPA, and read the returned best prompt.

Who this is for

Engineers and solution owners who need a production-style reference for prompt optimization against a concrete task with reproducible commands and outputs.

How to use this guide

Skim Sections 1–4 (concept, contract, dataset, prompt).
Pick Python (reference) or another language (Section 5) and run locally (Section 6).
Expose via tunnel and launch GEPA (Sections 7–8).
Compare your run to the recorded results (Section 9) and adapt (Section 11).

1. Concept: how Synth uses your Task App

Seeds: env.seed selects the dataset row. Optimizers sweep seeds to score a prompt.
Policy config: policy.config carries the prompt and inference_url (model endpoint).
Your Task App: For each /rollout, load sample → call LLM → compute reward → return metrics.mean_return in [0,1].
Optimizer (MIPRO/GEPA): Iterates on prompts based on the rewards you emit.

2. HTTP contract (inbound)

Auth: X-API-Key must equal ENVIRONMENT_API_KEY for /task_info and /rollout (no auth on /health).
Endpoints:

Method	Path	Auth	Purpose
GET	`/health`	none	Liveness
GET	`/task_info`	`X-API-Key`	Describe task/dataset
POST	`/rollout`	`X-API-Key`	Run one rollout and return reward

/rollout request (what Synth sends):

{
  "env": { "seed": 0 },
  "policy": {
    "config": {
      "prompt_template": "You are an expert banking assistant that classifies customer queries into banking intents. Given a customer message, respond with exactly one intent label from the provided list using the `banking77_classify` tool.\n\nCustomer Query: {{text}}\n\nAvailable Intents:\n<full Banking77 label list from data/banking77.json labels>\n\nClassify this query into one of the above banking intents using the tool call, and return only the label.",
      "inference_url": "https://api.openai.com/v1?model=gpt-4.1-nano",
      "model": "gpt-4.1-nano"
    }
  }
}

(The Python Task App also injects the full label list automatically; providing it in prompt_template mirrors the langprobe baseline and the GEPA run.) /rollout response (what you return):

{
  "metrics": { "mean_return": 1.0 },
  "trajectories": [ { "steps": [ { "reward": 1.0 } ] } ]
}

3. Dataset & reward

Data: code/prompt-learning/polyglot/data/banking77.json (labels array + 99-sample slice).
Reward: case-insensitive exact match; counts if the model outputs any known label token that equals the target; else 0.0. Pseudocode:

pred = predicted.lower().strip()
target = label.lower().strip()
if target in pred: return 1.0
for lbl in LABEL_SET:
    if lbl in pred:
        return 1.0 if lbl == target else 0.0
return 1.0 if pred.split()[0] == target else 0.0

4. Prompt we optimized (langprobe-style)

System: “You are an expert banking assistant… respond with exactly one intent label from the provided list…”
User: Includes the customer query and the full Banking77 label list, then instructs: “Classify this query into one of the above banking intents… return only the label.”
Deterministic: temperature: 0.
Model: gpt-4.1-nano (no Groq in this pipeline).

5. Repo layout (polyglot)

code/prompt-learning/polyglot
├── data/banking77.json
├── python/        # FastAPI Task App (reference)
├── rust/          # Axum
├── go/            # net/http
├── typescript/    # Hono
└── zig/           # Zig stdlib

Python is the reference; other languages share the same contract and reward logic.

6. Run locally (Python reference)

cd cookbooks/code/prompt-learning/polyglot/python
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
ENVIRONMENT_API_KEY=demo OPENAI_API_KEY=sk-... PORT=8010 \
  uvicorn app:app --host 0.0.0.0 --port 8010
# Smoke
curl http://localhost:8010/health
curl -H "X-API-Key: demo" http://localhost:8010/task_info
curl -H "X-API-Key: demo" -H 'Content-Type: application/json' \
  -d '{"env":{"seed":0},"policy":{"config":{"prompt_template":"<label-list prompt above>","inference_url":"https://api.openai.com/v1?model=gpt-4.1-nano"}}}' \
  http://localhost:8010/rollout

7. Tunnel + env key (automated)

Script: cookbooks/dev/tunnel_gepa_banking77/run_gepa_with_tunnel.sh

Generates ENVIRONMENT_API_KEY if absent and writes .env.tunnel (used by backend).
Deploys the local Task App via Cloudflare Tunnel (quick mode).
Saves the tunnel URL in .env.tunnel as TASK_APP_URL.

8. Launch GEPA (prod backend)

cd cookbooks/dev/tunnel_gepa_banking77
./run_gepa_with_tunnel.sh

Key run (recorded):

Backend: https://agent-learning.onrender.com
Tunnel: https://united-appointments-scholar-incl.trycloudflare.com
Job ID: pl_6d3e035b37f04b67
Budget: 200 rollouts, 40 transformations, ~89s, ~$0.05.
Validation: baseline 0.6333 → best 0.7000 (Δ +0.0667).
Results/logs: cookbooks/dev/tunnel_gepa_banking77/results/ (e.g., gepa_results_pl_6d3e035b37f04b67_20251124_164457.txt).

9. Recorded outcomes (what we actually saw)

Local seeds 0–9 with the label-constrained prompt: rewards [(0,1.0),(1,1.0),(2,1.0),(3,1.0),(4,0.0),(5,1.0),(6,0.0),(7,1.0),(8,1.0),(9,1.0)] → mean_return 0.80.
GEPA run: baseline val 0.6333 → best val 0.7000. Best prompt text is in the results file above (label-only, tool-style instructions).

10. Job payload (copy/paste)

Use your tunnel URL and the label-explicit prompt:

{
  "env": { "seed": 0 },
  "policy": {
    "config": {
      "prompt_template": "You are an expert banking assistant that classifies customer queries into banking intents. Given a customer message, respond with exactly one intent label from the provided list using the `banking77_classify` tool.\n\nCustomer Query: {{text}}\n\nAvailable Intents:\n<full Banking77 label list from data/banking77.json labels>\n\nClassify this query into one of the above banking intents using the tool call, and return only the label.",
      "inference_url": "https://api.openai.com/v1?model=gpt-4.1-nano",
      "model": "gpt-4.1-nano"
    }
  }
}

(Replace <full Banking77 label list...> with the labels array from data/banking77.json.)

11. Adapt this to your task

Swap data/banking77.json for your dataset; adjust loader per language.
Change compute_reward to your metric (similarity, numeric score, etc.).
Keep the contract stable (/health, /task_info, /rollout, metrics.mean_return ∈ [0,1], X-API-Key).
Tune inference_url for model/temperature/max_tokens as needed.

12. Production notes

Logging: request ID, seed, model, latency, success/failure; never log secrets.
Metrics: request/error rate, rollout latency (p50/p95/p99), mean reward trend.
Security: rotate ENVIRONMENT_API_KEY; keep LLM keys in secret storage; rate-limit at the tunnel edge.
Reliability: timeouts on LLM calls; reward=0 on hard failures; distinguish 4xx vs 5xx.

13. Files to check

Cookbook page: monorepo/docs/prompt-learning/polyglot.mdx (this file).
Python Task App: cookbooks/code/prompt-learning/polyglot/python/app.py.
GEPA script/results: cookbooks/dev/tunnel_gepa_banking77/.

Prompt Optimization

Reinforcement Fine-Tuning

Supervised Fine-Tuning

Judges

Misc

Polyglot Task App: Banking77

Polyglot Task App: Banking77

Overview

What you’ll learn

Who this is for

How to use this guide

1. Concept: how Synth uses your Task App

2. HTTP contract (inbound)

3. Dataset & reward

4. Prompt we optimized (langprobe-style)

5. Repo layout (polyglot)

6. Run locally (Python reference)

7. Tunnel + env key (automated)

8. Launch GEPA (prod backend)

9. Recorded outcomes (what we actually saw)

10. Job payload (copy/paste)

11. Adapt this to your task

12. Production notes

13. Files to check

Prompt Optimization

Reinforcement Fine-Tuning

Supervised Fine-Tuning

Judges

Misc

​Polyglot Task App: Banking77

​Overview

​What you’ll learn

​Who this is for

​How to use this guide

​1. Concept: how Synth uses your Task App

​2. HTTP contract (inbound)

​3. Dataset & reward

​4. Prompt we optimized (langprobe-style)

​5. Repo layout (polyglot)

​6. Run locally (Python reference)

​7. Tunnel + env key (automated)

​8. Launch GEPA (prod backend)

​9. Recorded outcomes (what we actually saw)

​10. Job payload (copy/paste)

​11. Adapt this to your task

​12. Production notes

​13. Files to check

Polyglot Task App: Banking77

Overview

What you’ll learn

Who this is for

How to use this guide

1. Concept: how Synth uses your Task App

2. HTTP contract (inbound)

3. Dataset & reward

4. Prompt we optimized (langprobe-style)

5. Repo layout (polyglot)

6. Run locally (Python reference)

7. Tunnel + env key (automated)

8. Launch GEPA (prod backend)

9. Recorded outcomes (what we actually saw)

10. Job payload (copy/paste)

11. Adapt this to your task

12. Production notes

13. Files to check