Skip to main content

About MIPRO

MIPRO optimizes prompts through:
  • Instruction Generation: Using a meta-model to propose new prompt instructions based on successes/failures
  • Demo Selection: Selecting optimal few-shot examples using Tree-structured Parzen Estimator (TPE)
  • Iterative Improvement: Combining the best instructions and demos across multiple iterations
Read more at Opsahl-Ong et al. (2024).

Start training

1. Install the Demo

uvx synth-ai demo mipro
This creates a demo directory with:
  • main.py - In-process MIPRO runner
  • task_app.py - Banking77 intent classification task
  • train_cfg.toml - MIPRO training configuration

2. Set Up Environment

uvx synth-ai setup
synth-ai setup does the following:
  • Fetches your SYNTH_API_KEY and ENVIRONMENT_API_KEY from https://usesynth.ai via your web browser
  • Saves your SYNTH_API_KEY and ENVIRONMENT_API_KEY to .env in current working directory and ~/.synth-ai/config.json
  • Loads your SYNTH_API_KEY and ENVIRONMENT_API_KEY to process environment
This step is optional if you prefer to load your SYNTH_API_KEY and ENVIRONMENT_API_KEY manually. You will also need a GROQ_API_KEY. Either save this to your .env or load to your process environment, alongside your SYNTH_API_KEY and ENVIRONMENT_API_KEY.

3. Run MIPRO Optimization

# If .env in CWD or if keys loaded to process environment
uv run demo_mipro/main.py

# Or specify a .env file
uv run demo_mipro/main.py --env [/path/to/.env]
The script will:
  1. Start the Banking77 task app in-process on port 8114
  2. Submit a MIPRO job to the backend
  3. Stream progress in real-time
  4. Save results when complete

Configuration

Basic Configuration

The train_cfg.toml file configures your MIPRO run:
[prompt_learning]
algorithm = "mipro"
task_app_url = "http://127.0.0.1:8102"  # Auto-configured by main.py
task_app_id = "banking77"

# Your initial prompt template
[prompt_learning.initial_prompt]
id = "banking77_pattern"
name = "Banking77 Classification Pattern"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are an expert banking assistant..."
order = 0

# Policy model configuration
[prompt_learning.policy]
inference_mode = "synth_hosted"
model = "openai/gpt-oss-20b"
provider = "groq"
temperature = 0.0
max_completion_tokens = 128

MIPRO Parameters

[prompt_learning.mipro]
env_name = "banking77"
num_iterations = 5              # Number of optimization iterations
num_evaluations_per_iteration = 2  # Proposals to evaluate per iteration
batch_size = 6                  # Parallel evaluation batch size
max_concurrent = 16             # Max concurrent rollouts

# Meta-model for instruction generation
meta_model = "llama-3.3-70b-versatile"
meta_model_provider = "groq"

# Seed pools for different phases
bootstrap_train_seeds = [0, 1, 2, ..., 14]  # Initial training seeds
online_pool = [15, 16, ..., 39]              # Online evaluation seeds
test_pool = [40, 41, ..., 49]                # Held-out test seeds
val_seeds = [50, 51, ..., 59]                # Validation seeds

Advanced Configuration

TPE Hyperparameters

Tree-structured Parzen Estimator controls demo selection:
[prompt_learning.mipro.tpe]
gamma = 0.25              # Quantile for good/bad split
n_candidates = 32         # Candidates to evaluate
n_startup_trials = 10     # Random trials before TPE
epsilon = 0.1             # Exploration probability

Demo Selection

[prompt_learning.mipro.demo]
max_few_shot_examples = 5  # Max examples in prompt
sets_per_size = 6          # Demo sets per size
include_empty = true       # Include zero-shot

Instruction Proposals

[prompt_learning.mipro.grounding]
n = 10                     # Number of proposals per iteration
temperature = 0.7          # Generation temperature
max_tokens = 600           # Max tokens per proposal

Meta-Updates

Periodic regeneration of instructions based on latest results:
[prompt_learning.mipro.meta_update]
enabled = true
every_iterations = 3       # Regenerate every N iterations
topk_success = 5           # Top successes to analyze
topk_failure = 5           # Top failures to analyze
keep_k = 12                # Max instruction variants

Understanding Results

After completion, MIPRO saves results to your configured results_folder:

Results File

mipro_results_<job_id>_<timestamp>.txt
Contains:
  • Best Score: Final optimized accuracy
  • Baseline Score: Initial prompt performance
  • Improvement: Relative and absolute gains
  • Best Prompt: The optimized prompt with instructions and demos
  • Top-K Candidates: Best performing prompt combinations
  • Proposed Instructions: All generated instructions

Verbose Log

mipro_log_<job_id>_<timestamp>.log
Contains:
  • Detailed event stream
  • All instruction proposals
  • TPE selection decisions
  • Per-seed evaluation results

Example: Banking77 Intent Classification

The demo uses the Banking77 dataset with 77 banking intent categories:
# Task app defines the classification task
class Banking77TaskApp:
    def rollout(self, seed: int, run_id: str):
        # Get test example for this seed
        example = self.dataset[seed]

        # Call policy model with prompt
        response = self.policy_client.chat.completions.create(
            model=policy.model,
            messages=prompt_messages,
            tools=[banking77_tool],
            tool_choice={
              "type": "function",
              "function": {"name": "banking77_classify"}
            }
        )

        # Check if classification is correct
        correct = predicted_intent == example["intent"]
        return {"reward": 1.0 if correct else 0.0}
MIPRO will:
  1. Evaluate the baseline prompt on bootstrap seeds
  2. Propose new instructions using the meta-model
  3. Select optimal demo combinations using TPE
  4. Evaluate candidate prompts on online pool
  5. Return the best performing prompt combination

Key Concepts

Seed Pools

MIPRO uses different seed pools for different phases:
  • Bootstrap Train Seeds: Initial evaluation to establish baseline
  • Online Pool: Used during optimization iterations
  • Test Pool: Held-out seeds for final evaluation
  • Val Seeds: Validation set for top-K selection
  • Reference Pool: Examples shown to meta-model for context

Instruction vs Demo Optimization

MIPRO optimizes two aspects:
  1. Instructions: The system/user message templates
    • Generated by meta-model analyzing successes/failures
    • Grounded in actual task performance
  2. Demos: The few-shot examples shown in the prompt
    • Selected using TPE based on historical performance
    • Optimized for each instruction variant

Meta-Model vs Policy Model

  • Policy Model: The model being optimized (e.g., openai/gpt-oss-20b)
    • Runs your actual task
    • Needs to be fast and cost-effective
  • Meta-Model: The instruction generator (e.g., llama-3.3-70b-versatile)
    • Analyzes performance and proposes improvements
    • Should be more capable than policy model

Troubleshooting

”OPENAI_API_KEY required”

If your policy or meta-model uses OpenAI as provider, you need:
OPENAI_API_KEY=sk-proj-...
Add to your .env file or load to process environment.

”Task app health check failed”

The in-process task app failed to start. Check:
  1. Port 8114 is available: lsof -ti:8114
  2. Your .env file is loaded correctly
  3. ENVIRONMENT_API_KEY is set

”Tool choice is required, but model did not call a tool”

Some models have poor tool-calling reliability. Consider:
  1. Using a different policy model
  2. Adjusting temperature (try 0.0 for deterministic)
  3. Increasing max_completion_tokens

Next Steps

  • Customize the Task: Modify task_app.py for your use case
  • Tune Hyperparameters: Adjust train_cfg.toml for better results
  • Try Different Models: Experiment with policy and meta-models
  • Scale Up: Increase seed pools and iterations for production