Skip to main content2025-11-07 – Multi-Stage Optimizers & Expanded Model Support
🚀 New Features
-
Multi-Stage MIPRO & GEPA: Both prompt learning algorithms now support multi-stage pipeline optimization for complex workflows with multiple processing stages.
- MIPRO Multi-Stage: Generates per-stage instruction proposals with automatic stage detection via LCS (Longest Common Subsequence) matching. Each stage gets stage-specific meta-prompts including pipeline overview, stage role, and baseline performance. Supports per-module configuration with
max_instruction_slots and max_demo_slots for fine-grained control.
- GEPA Multi-Stage: Uses module-aware evolution where each pipeline module gets its own gene. Mutations target specific modules, uniform crossover combines parent genes per module, and aggregated scoring sums module lengths for Pareto optimization. Supports per-module
max_instruction_slots, max_tokens, and allowed_tools configuration.
- Configuration: Both algorithms support
pipeline_modules metadata in initial prompts and module-specific settings in their respective config sections (prompt_learning.gepa.modules and prompt_learning.mipro.modules).
-
Gemini Model Support: Added comprehensive support for Google Gemini models as policy models for both GEPA and MIPRO algorithms.
- Supported Models:
gemini-2.5-pro (≤200k tokens), gemini-2.5-pro-gt200k (>200k tokens), gemini-2.5-flash, and gemini-2.5-flash-lite.
- Provider Integration: Full SDK validation and backend support for
provider = "google" with automatic pricing calculation and token tracking.
- Example Configs: Added example configurations demonstrating Gemini usage, including
banking77_pipeline_mipro_gemini_flash_lite_local.toml for cost-effective multi-stage optimization.
-
OpenAI Model Support: Expanded OpenAI model support for prompt learning with comprehensive coverage of latest models.
- Supported Models:
gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, and gpt-5-nano.
- Model Validation: SDK-side validation with clear error messages for unsupported models. Explicit rejection of
gpt-5-pro due to high cost (15/120 per 1M tokens).
- Provider Prefix Support: Models can be specified with or without provider prefix (e.g.,
"gpt-4o" or "openai/gpt-4o").
-
SDK Validation Enhancements: Improved config validation with comprehensive error checking before sending to backend.
- Multi-Stage Validation: Validates that
pipeline_modules match module configs, checks for missing or extra modules, and ensures proper module ID matching.
- Model Validation: Provider-aware model validation with detailed error messages listing supported models for each provider.
- Nano Model Restrictions: Clear validation that nano models (
gpt-4.1-nano, gpt-5-nano) are allowed for policy models but rejected for mutation/meta models (too small for generation tasks).
🔧 Technical Improvements
- Config Parsing: Enhanced TOML parsing for multi-stage configurations with support for nested module and stage definitions.
- Integration Tests: Added comprehensive integration tests for multi-stage GEPA and MIPRO workflows, including Gemini model validation tests.
- Error Messages: Improved validation error messages with actionable suggestions and links to example configurations.
📚 Documentation
- Multi-Stage Pipeline Guide: Updated documentation with examples and configuration details for optimizing multi-stage pipelines with both GEPA and MIPRO algorithms.
- Model Support Reference: Complete documentation of supported models for each provider (OpenAI, Groq, Google) with usage examples.
- Example Configurations: Added example configs demonstrating multi-stage optimization with different model providers, including
multi_stage_gepa_example.toml and banking77_pipeline_mipro_gemini_flash_lite_local.toml.
2025-11-04 – GEPA: Genetic Evolution for Prompt Optimization
🚀 New Features
- GEPA Algorithm: Genetic Evolution for Prompt Optimization (GEPA) is now available for prompt learning jobs. GEPA uses evolutionary algorithms (mutation, crossover, selection) to optimize prompts across multiple generations, achieving significant accuracy improvements on classification and reasoning tasks.
- Prompt ID-Based URLs: Prompt transformations now use versioned URLs (
/v1/{prompt_version_id}/chat/completions) for better traceability, concurrency, and debugging. Each transformation gets a unique version ID based on content hashing.
- Multi-Objective Optimization: GEPA maintains a Pareto front balancing accuracy, token count, and task-specific metrics (e.g., tool call rate).
- Validation Scoring: Job results now distinguish between
prompt_best_train_score and prompt_best_validation_score for clearer evaluation metrics.
- Integration Testing: Added comprehensive integration tests for GEPA training workflows with Banking77 task app.
📚 Documentation
- GEPA Guide: Complete documentation with quick start, configuration examples, and troubleshooting for Banking77, HotpotQA, IFBench, HoVer, and PUPA tasks.
- Integration Examples: Step-by-step guides for deploying task apps and running GEPA optimization locally and on Modal.
2025-10-28 – Terminal Training Logs
🚀 New Features
- Full terminal streaming logs: Both
uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.
2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL
🚀 New Features
- Hosted Synth judges (configurable): Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
- Rubric-aware filtering: SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
- Qwen-VL support across SFT & RL: Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
- Instruct-model RL guidance: Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.
2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput
🚀 New Features
- Qwen Coder models supported: Qwen Coder variants are now available across SFT and inference workflows.
- SDK migrated to Turso for concurrency: Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
- More training topologies on H200s: Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
- Full LoRA support for Policy Gradient: LoRA integrated end-to-end into Policy Gradient training flows.
- Pipelined RL async rollouts: Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.
2025-10-09 – LoRA, MoE & Large Model Support
🚀 New Features
- Expanded Qwen catalog: Simple Training now ships SFT and inference presets for every Qwen release outside the existing
qwen3-{0.6B–32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
- Large-model inference & training topologies: Added 2×, 4×, and 8× layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
- Turnkey rollout: API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
- LoRA-first SFT: Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.
🚀 New Features
- Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
- B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads
- Faster Inference: Optimized inference pipeline with improved throughput and reduced latency across all model sizes
- GSPO Support: Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training
2025-09-17 – Online RL (customer‑visible features)
-
Organization‑scoped environment credentials
- Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
-
First‑party Task App integration
- Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
-
Single‑node, multi‑GPU Online RL
- Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
- Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
-
Production run flow
- Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.
0.2.2.dev2 — Aug 8, 2025
- Fine-tuning (SFT) endpoints available and documented end-to-end
- Interactive demo launcher (
uvx synth-ai demo) with finetuning flow for Qwen 4B
- Live polling output during training with real-time status updates
- CLI Reference for
uvx synth-ai serve, uvx synth-ai traces, and demo launcher
0.2.2.dev1 — Aug 7, 2025
- New backend balance APIs and CLI for account visibility
- CLI utilities:
balance, traces, and man commands
- Traces inventory view with per-DB counts and storage footprint
- Standardized one-off usage:
uvx synth-ai <command> (removed interactive watch)
- Improved
.env loading and API key resolution
0.2.2.dev0 — Jul 30, 2025
- Environment Registration API for custom environments
- Turso/sqld daemon support with local-first replicas
- Environment Service Daemon via
uvx synth-ai serve
0.2.1.dev1 — Jul 29, 2025
- Initial development release
Feb 3, 2025
- Cuvier Error Search (deprecated)
Jan 2025
- Langsmith integration for Enterprise partners
- Python SDK v0.3 (simplified API, Anthropic support)