About MIPRO
MIPRO optimizes prompts through:- Instruction Generation: Using a meta-model to propose new prompt instructions based on successes/failures
- Demo Selection: Selecting optimal few-shot examples using Tree-structured Parzen Estimator (TPE)
- Iterative Improvement: Combining the best instructions and demos across multiple iterations
Start training
1. Install the Demo
main.py- In-process MIPRO runnertask_app.py- Banking77 intent classification tasktrain_cfg.toml- MIPRO training configuration
2. Set Up Environment
synth-ai setup does the following:
- Fetches your
SYNTH_API_KEYandENVIRONMENT_API_KEYfrom https://usesynth.ai via your web browser - Saves your
SYNTH_API_KEYandENVIRONMENT_API_KEYto .env in current working directory and ~/.synth-ai/config.json - Loads your
SYNTH_API_KEYandENVIRONMENT_API_KEYto process environment
SYNTH_API_KEY and ENVIRONMENT_API_KEY manually.
You will also need a GROQ_API_KEY. Either save this to your .env or load to your process environment, alongside your SYNTH_API_KEY and ENVIRONMENT_API_KEY.
3. Run MIPRO Optimization
- Start the Banking77 task app in-process on port 8114
- Submit a MIPRO job to the backend
- Stream progress in real-time
- Save results when complete
Configuration
Basic Configuration
Thetrain_cfg.toml file configures your MIPRO run:
MIPRO Parameters
Advanced Configuration
TPE Hyperparameters
Tree-structured Parzen Estimator controls demo selection:Demo Selection
Instruction Proposals
Meta-Updates
Periodic regeneration of instructions based on latest results:Understanding Results
After completion, MIPRO saves results to your configuredresults_folder:
Results File
- Best Score: Final optimized accuracy
- Baseline Score: Initial prompt performance
- Improvement: Relative and absolute gains
- Best Prompt: The optimized prompt with instructions and demos
- Top-K Candidates: Best performing prompt combinations
- Proposed Instructions: All generated instructions
Verbose Log
- Detailed event stream
- All instruction proposals
- TPE selection decisions
- Per-seed evaluation results
Example: Banking77 Intent Classification
The demo uses the Banking77 dataset with 77 banking intent categories:- Evaluate the baseline prompt on bootstrap seeds
- Propose new instructions using the meta-model
- Select optimal demo combinations using TPE
- Evaluate candidate prompts on online pool
- Return the best performing prompt combination
Key Concepts
Seed Pools
MIPRO uses different seed pools for different phases:- Bootstrap Train Seeds: Initial evaluation to establish baseline
- Online Pool: Used during optimization iterations
- Test Pool: Held-out seeds for final evaluation
- Val Seeds: Validation set for top-K selection
- Reference Pool: Examples shown to meta-model for context
Instruction vs Demo Optimization
MIPRO optimizes two aspects:-
Instructions: The system/user message templates
- Generated by meta-model analyzing successes/failures
- Grounded in actual task performance
-
Demos: The few-shot examples shown in the prompt
- Selected using TPE based on historical performance
- Optimized for each instruction variant
Meta-Model vs Policy Model
-
Policy Model: The model being optimized (e.g.,
openai/gpt-oss-20b)- Runs your actual task
- Needs to be fast and cost-effective
-
Meta-Model: The instruction generator (e.g.,
llama-3.3-70b-versatile)- Analyzes performance and proposes improvements
- Should be more capable than policy model
Troubleshooting
”OPENAI_API_KEY required”
If your policy or meta-model uses OpenAI as provider, you need:.env file or load to process environment.
”Task app health check failed”
The in-process task app failed to start. Check:- Port 8114 is available:
lsof -ti:8114 - Your
.envfile is loaded correctly ENVIRONMENT_API_KEYis set
”Tool choice is required, but model did not call a tool”
Some models have poor tool-calling reliability. Consider:- Using a different policy model
- Adjusting
temperature(try 0.0 for deterministic) - Increasing
max_completion_tokens
Next Steps
- Customize the Task: Modify
task_app.pyfor your use case - Tune Hyperparameters: Adjust
train_cfg.tomlfor better results - Try Different Models: Experiment with policy and meta-models
- Scale Up: Increase seed pools and iterations for production