Skip to main content On this page 2025-11-17 β SDK Release 0.2.25.dev1 π¦ Package Updates 2025-11-17 β Vendored Prompt Learning: Production-Ready Examples π New Features Production Prompt Optimization Examples (vendored_prompt_learning) Production Integration Features π Documentation π― Use Cases π§ Technical Improvements 2025-11-14 β Artifacts CLI π New Features Artifacts Management (synth-ai artifacts) Prompt Extraction Intelligence π§ Technical Improvements π Documentation π― Use Cases 2025-11-14 β In-Process Task App Utility π New Features In-Process Task App (InProcessTaskApp) Public Cloudflare Utilities π§ Technical Improvements π Documentation π― Use Cases 2025-11-14 β Experiment Queue System π New Features Experiment Queue Management (synth-ai queue) Experiment Submission (synth-ai experiment) Redis-Based Queue Backend π§ Technical Improvements π Documentation π― Use Cases 2025-11-14 β Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements π New Features Task App Discovery and Health Checking (synth-ai scan) Backend Cloudflare Tunnel Infrastructure Unified Artifacts API Prompt Optimization Proposer Backends OpenAI & Groq Provider Support Experiment Queue System (synth-ai queue & synth-ai experiment) Deploy Command Enhancements π§ Technical Improvements π Documentation 2025-11-11 β Session-Based Pricing & Budget Enforcement π New Features π Session Management π Budget Enforcement π Documentation π§ Technical Improvements 2025-11-11 β Cloudflare Tunnel Support for Task App Deployment π New Features π Documentation π§ Technical Improvements 2025-11-09 β First-Class Codex Support for Synth Models π New Features π§ Technical Improvements 2025-11-07 β Multi-Stage Optimizers & Expanded Model Support π New Features π§ Technical Improvements π Documentation 2025-11-04 β GEPA: Genetic Evolution for Prompt Optimization π New Features π Documentation 2025-10-28 β Terminal Training Logs π New Features 2025-10-27 β Rubrics, Hosted Judges & Qwen-VL RL π New Features 2025-10-17 β Qwen Coder, Turso, H200 Topologies & RL Throughput π New Features 2025-10-09 β LoRA, MoE & Large Model Support π New Features 2025-09-24 β Platform Updates π New Features 2025-09-17 β Online RL (customerβvisible features) 0.2.2.dev2 β Aug 8, 2025 0.2.2.dev1 β Aug 7, 2025 0.2.2.dev0 β Jul 30, 2025 0.2.1.dev1 β Jul 29, 2025 Feb 3, 2025 Jan 2025 2025-11-17 β SDK Release 0.2.25.dev1
π¦ Package Updates
Version Bump : Updated synth-ai package to 0.2.25.dev1
Modal Deployment : Updated default SYNTH_AI_VERSION in Modal deployments to 0.2.25.dev1
2025-11-17 β Vendored Prompt Learning: Production-Ready Examples
π New Features
Production Prompt Optimization Examples (vendored_prompt_learning)
Complete Pipeline Examples : New production-ready examples demonstrating prompt optimization on the fly in production environments
GEPA Pipeline : run_gepa_example.py - Complete GEPA optimization workflow from baseline evaluation to final prompt deployment
MIPRO Pipeline : run_mipro_example.py - Complete MIPRO optimization workflow with programmatic polling and progress tracking
In-Process Task Apps : Automatic task app management with Cloudflare tunnel support for production deployments
Self-Contained Scripts : Everything in one script - no external dependencies or manual setup required
Production Integration Features
In-Process Task App Management : InProcessTaskApp utility automatically manages FastAPI servers and Cloudflare tunnels
Automatic Tunnel Creation : Opens Cloudflare tunnels automatically for production use
Background Server Management : Runs task apps in background threads with graceful shutdown
Port Conflict Resolution : Automatically finds available ports if requested port is busy
Programmatic Polling : Built-in job status polling with progress callbacks and timeout handling
Prompt Retrieval : Easy extraction of optimized prompts from completed jobs
Baseline & Final Evaluation : Complete evaluation pipeline comparing initial vs optimized prompts
π Documentation
Production Guide : Comprehensive guide at /blog/prompt-optimization-benchmarks explaining production use cases
Examples Directory : Complete examples in examples/blog_posts/vendored_prompt_learning/
Full pipeline scripts for GEPA and MIPRO
Configuration files for all benchmarks (Banking77, HeartDisease, HotpotQA, Pupa)
In-process scripts with minimal budgets for quick testing
README : Detailed README with setup instructions, architecture overview, and customization options
π― Use Cases
A/B Testing : Automatically find better prompts for your use case without manual intervention
Performance Tuning : Continuously improve prompt performance as your data changes
Multi-Tenant Optimization : Optimize prompts per customer or use case
Rapid Iteration : Test and deploy better prompts faster than manual tuning
π§ Technical Improvements
Unified Benchmark Suite : Consolidated all GEPA and MIPRO examples from blog_posts/gepa/ and blog_posts/mipro/ into a single directory
Path Fixes : Fixed environment variable loading and task app path resolution for reliable execution
Budget Controls : Configurable rollout budgets with minimal budget modes for quick testing (~1 minute)
Error Handling : Comprehensive error handling with clear messages and fallback behavior
2025-11-14 β Artifacts CLI
π New Features
Artifacts Management (synth-ai artifacts)
Unified Artifact Management : New synth-ai artifacts command suite for managing and inspecting all Synth AI artifacts
List Artifacts : artifacts list displays all fine-tuned models, RL models, and optimized prompts with filtering options
Show Details : artifacts show provides detailed information about specific artifacts, with intelligent prompt extraction
Export Models : artifacts export exports models to HuggingFace Hub (private or public)
Download Prompts : artifacts download downloads optimized prompts in JSON, YAML, or text formats
Smart ID Parsing : Centralized parsing logic handles all artifact ID formats (ft:, rl:, peft:, pl_)
Best Prompt Display : Default view shows job summary + best optimized prompt with syntax highlighting
Verbose Mode : --verbose flag shows full metadata and snapshot details for prompts
JSON Export : --format json enables easy scripting and data export
Multi-Structure Support : Automatically extracts prompt messages from various snapshot structures:
Direct messages arrays
object β messages arrays
object β text_replacements arrays (GEPA structure)
initial_prompt β data β messages structures
Syntax Highlighting : Rich formatting with role-based colors (system/user/assistant)
Default View : Shows only the most important information (summary + best prompt)
Verbose View : Full metadata and snapshot details when needed
π§ Technical Improvements
Centralized Parsing : All artifact ID parsing logic consolidated in synth_ai.cli.commands.artifacts.parsing
Comprehensive validation and error handling
Support for special characters, unicode, whitespace handling
Type detection and validation utilities
Backend Integration : New backend endpoints in /api/artifacts:
GET /api/artifacts - List all artifacts with filtering
GET /api/artifacts/models/{model_id} - Get model details
GET /api/artifacts/prompts/{job_id} - Get prompt details with metadata extraction
POST /api/learning/exports/hf - Export to HuggingFace
Error Handling : Clear error messages for invalid IDs, missing artifacts, authentication failures
Test Coverage : Comprehensive test suite with 59+ tests covering:
All parsing edge cases (whitespace, unicode, special chars, empty values, etc.)
All detection and validation functions
Wasabi key resolution
CLI command behavior with mocked clients
Integration tests for real backend calls
π Documentation
CLI Documentation : Complete CLI reference page at /cli/artifacts
README : Comprehensive README in synth-ai/synth_ai/cli/commands/artifacts/README.md
Code Examples : Usage examples for all subcommands and common workflows
π― Use Cases
Artifact Discovery : Quickly find and inspect all your trained models and optimized prompts
Prompt Extraction : Easily extract and use optimized prompts from completed jobs
Model Sharing : Export models to HuggingFace for sharing or deployment
Scripting : JSON output enables automation and integration with other tools
Debugging : Verbose mode helps troubleshoot job issues and inspect metadata
2025-11-14 β In-Process Task App Utility
π New Features
In-Process Task App (InProcessTaskApp)
Automatic Server & Tunnel Management : New InProcessTaskApp utility enables running task apps entirely within your Python script
Background Server : Automatically starts uvicorn server in background thread
Automatic Tunnel Creation : Opens Cloudflare quick tunnel automatically and provides tunnel URL
Port Conflict Handling : Automatically finds available ports if requested port is busy (auto_find_port=True by default)
Multiple Input Methods : Supports FastAPI app instance, TaskAppConfig object, config factory function, or task app file path
Signal Handling : Graceful shutdown on SIGINT/SIGTERM prevents orphaned processes
Observability : Structured logging and optional on_start/on_stop callbacks
Input Validation : Comprehensive validation with clear error messages for port range, host security, tunnel mode, and file existence
Public Cloudflare Utilities
Public API Functions : Made start_uvicorn_background() and wait_for_health_check() public in synth_ai.cloudflare
Can be used independently for custom server management
Previously private functions now available for advanced use cases
π§ Technical Improvements
Port Management : Smart port conflict resolution with automatic port finding
_find_available_port() helper searches for available ports starting from requested port
_is_port_available() helper checks port availability
_kill_process_on_port() helper attempts to free occupied ports (best-effort, cross-platform)
Error Handling : Comprehensive error handling with actionable error messages
Test Coverage : Full test suite with unit tests (383 lines) and integration tests (145 lines)
Security : Host validation restricts to localhost addresses for security
π Documentation
SDK Documentation : Complete SDK reference page at /sdk/in-process-task-app
README : Comprehensive README in synth-ai/IN_PROCESS_TASK_APP_README.md with usage examples
Code Examples : Multiple usage examples including GEPA integration, callbacks, and port handling
π― Use Cases
Local Development : Run prompt optimization without separate terminal processes
Demos & Scripts : Simplify demo scripts and automation workflows
CI/CD : Integrate task apps into automated testing and deployment pipelines
Resource Management : Automatic cleanup prevents resource leaks
2025-11-14 β Experiment Queue System
π New Features
Experiment Queue Management (synth-ai queue)
Queue Worker Management : New synth-ai queue command suite for managing Celery workers that process experiment jobs
Start Worker : synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
Stop Worker : synth-ai queue stop terminates all running experiment queue workers
Status Check : synth-ai queue status (or synth-ai queue) displays current worker status and configuration
Single Worker Enforcement : Automatically kills existing workers before starting new ones to ensure only one instance runs
Database Path Validation : Enforces consistent database path usage across all workers
Background Mode : Workers run in background by default with logs written to logs/experiment_queue_worker.log
Beat Scheduler : Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs
Experiment Submission (synth-ai experiment)
Submit Experiments : synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
JSON Payloads : Submit from files or inline JSON strings
Job Validation : Validates config files and environment files before submission
Automatic Dispatch : Initial jobs are dispatched immediately upon submission
Parallelism Control : Configure how many jobs run concurrently per experiment
Experiment Management : synth-ai experiment list and synth-ai experiments provide dashboard views
Status Filtering : Filter experiments by status (QUEUED, RUNNING, COMPLETED, FAILED, CANCELED)
Watch Mode : Continuous refresh with --watch flag for real-time monitoring
JSON Output : Machine-readable JSON output for automation and scripting
Redis-Based Queue Backend
Redis Broker : Migrated from SQLite broker to Redis for improved reliability and performance
Default Configuration : Uses redis://localhost:6379/0 for broker and redis://localhost:6379/1 for result backend
Environment Variables : Configurable via EXPERIMENT_QUEUE_BROKER_URL and EXPERIMENT_QUEUE_RESULT_BACKEND_URL
No Locking Issues : Eliminates SQLite locking conflicts that prevented job dispatch
Concurrent Access : Supports multiple workers and concurrent job processing
π§ Technical Improvements
Database Architecture : SQLite database for experiment metadata, Redis for Celery broker/backend
WAL Mode : Application database uses Write-Ahead Logging for concurrent reads/writes
Path Enforcement : Single database path requirement prevents configuration conflicts
Automatic Initialization : Database schema created automatically on first use
Periodic Task System : Celery Beat scheduler automatically dispatches queued jobs every 5 seconds
Error Handling : Comprehensive error reporting with detailed diagnostics for authentication failures, health check failures, and silent failures
Result Parsing : Enhanced result collection supports both JSON and text-based outputs (GEPA/MIPRO)
Test Suite : Fast, comprehensive test suite (42 tests, ~2 seconds) with proper isolation and timeout handling
π Documentation
CLI Documentation : Complete documentation for synth-ai queue and synth-ai experiment commands
README : Comprehensive README in synth_ai/experiment_queue/ with usage examples and troubleshooting
Environment Variables : Documented required and optional environment variables for queue configuration
π― Use Cases
Local Development : Run prompt learning experiments locally without backend API
Batch Processing : Submit multiple experiments and let the queue process them automatically
Resource Management : Control parallelism to manage resource usage
Monitoring : Real-time status monitoring with watch mode and JSON output for dashboards
2025-11-14 β Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements
π New Features
Task App Discovery and Health Checking (synth-ai scan)
New Command : Added synth-ai scan command to discover and health-check running task applications
Multi-Method Discovery : Uses port scanning, service records, tunnel records, process inspection, backend API, and registry queries
Health Checks : Performs HTTP health checks on /health endpoints and extracts metadata from /info endpoints
Dual Output Formats : Supports both human-readable table format and machine-readable JSON format
Service Records Integration : Automatically discovers apps deployed via synth-ai deploy --runtime local
Tunnel Discovery : Discovers Cloudflare tunnels deployed via synth-ai deploy --runtime tunnel
Process Scanning : Inspects running cloudflared processes to find tunnel URLs
API Key Support : Supports authentication via X-API-Key header for protected apps
Backend Cloudflare Tunnel Infrastructure
Complete Tunnel Management API : Full backend infrastructure for managing Cloudflare tunnels
REST API Endpoints : New endpoints for tunnel CRUD operations (POST /api/v1/tunnels, GET /api/v1/tunnels, GET /api/v1/tunnels/tunnel, DELETE /api/v1/tunnels/{id})
Tunnel Manager Service : Lifecycle management with automatic tunnel replacement and Cloudflare Access credential support
Database Models : Complete tunnel schema with support for access_client_id and access_client_secret for Cloudflare Access integration
Database Migrations : Removed hostname unique constraint, added partial index for active tunnels
Deployment Scripts : Production-ready tunnel deployment utilities (run_backend_tunnel.sh, start_tunnel.py)
Comprehensive Testing : Full test suite for tunnel creation, management, and cleanup
Unified Artifacts API
Backend Artifacts Endpoints : New REST API for managing all Synth AI artifacts
List Artifacts : GET /api/artifacts - Unified view of fine-tuned models, RL models, and optimized prompts with filtering
Model Details : GET /api/artifacts/models/{model_id} - Get detailed model information
Prompt Details : GET /api/artifacts/prompts/{job_id} - Get prompt details with intelligent metadata extraction
Smart Prompt Extraction : Automatically extracts prompts from various snapshot structures (direct messages, GEPA text_replacements, nested structures)
Backend Integration : Connected to existing learning jobs and RL registries for unified artifact management
Prompt Optimization Proposer Backends
Modular Proposer System : New pluggable proposer architecture for GEPA and MIPRO prompt optimization
DSPy Proposer : DSPy-inspired instruction proposer (proposer_mode="dspy") - Standard DSPy-style mutation prompts for prompt generation
GEPA-AI Proposer : GEPA-AI inspired proposer (proposer_mode="gepa-ai") - Alternative instruction generation approach with GEPA-AI flavored prompts
Synth Proposer : Synth Research variant (proposer_mode="synth") - Synth-branded proposer currently mirroring DSPy behavior
Built-in Proposers : Base proposer implementations for both GEPA and MIPRO algorithms
Configuration : Per-proposer configuration with separate meta-model settings, temperature, and token limits
Backward Compatibility : Auto-migration from deprecated βbuiltinβ mode to βdspyβ mode with deprecation warnings
OpenAI & Groq Provider Support
Enhanced Provider Integration : Comprehensive support for OpenAI and Groq models in prompt optimization
OpenAI Models : Full support for GPT-4o, GPT-4.1, GPT-5 families with automatic pricing calculation
Groq Models : Support for Llama, Mixtral, and Gemma models via Groqβs OpenAI-compatible API
Provider URL Resolution : Centralized provider URL management (get_provider_url()) for consistent API endpoint resolution
Pricing Integration : Per-token pricing tables for accurate cost tracking and budget enforcement
Environment Overrides : Runtime pricing override via PL_TOKEN_RATES_JSON environment variable
Experiment Queue System (synth-ai queue & synth-ai experiment)
Queue Worker Management : New synth-ai queue command suite for managing Celery workers that process experiment jobs
Start Worker : synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
Stop Worker : synth-ai queue stop terminates all running experiment queue workers
Status Check : synth-ai queue status (or synth-ai queue) displays current worker status and configuration
Single Worker Enforcement : Automatically kills existing workers before starting new ones to ensure only one instance runs
Background Mode : Workers run in background by default with logs written to logs/experiment_queue_worker.log
Beat Scheduler : Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs
Experiment Submission : synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
JSON Payloads : Submit from files or inline JSON strings
Job Validation : Validates config files and environment files before submission
Automatic Dispatch : Initial jobs are dispatched immediately upon submission
Parallelism Control : Configure how many jobs run concurrently per experiment
Experiment Management : synth-ai experiment list provides dashboard views with status filtering and watch mode
Redis-Based Queue Backend : Migrated from SQLite broker to Redis for improved reliability and performance
No Locking Issues : Eliminates SQLite locking conflicts that prevented job dispatch
Concurrent Access : Supports multiple workers and concurrent job processing
Deploy Command Enhancements
Non-Blocking by Default : Deploy command now runs in background mode by default to prevent indefinite stalls
Local Deployments : Uses nohup to run uvicorn servers in background
Tunnel Deployments : Returns immediately after starting tunnel (non-blocking)
Modal Deployments : Starts process and returns immediately (non-blocking)
Optional Blocking Mode : Added --wait flag for interactive use when blocking behavior is desired
API Key Validation : Enhanced validation to ensure SYNTH_API_KEY and ENVIRONMENT_API_KEY are present before deployment
Checks both environment variables and provided .env file
Provides detailed error messages showing where keys were found or missing
Prevents deployments without required credentials
π§ Technical Improvements
Service Record Management : Local and tunnel deployments automatically record to ~/.synth-ai/services.json
Records include PID, URL, port, app_id, and deployment metadata
Automatic cleanup of stale records (dead processes)
Health Check Logic : Robust health check implementation with timeout handling and error detection
Performance : Uses asyncio for parallel health checks and port scanning
Comprehensive Testing : 33 tests total (19 unit, 11 integration, 3 end-to-end) with proper timeout handling
π Documentation
CLI Documentation : Added comprehensive scan command documentation
Tunnel Documentation : Complete guides for backend tunnel setup, deployment, and troubleshooting
Artifacts API : Backend API documentation for artifact management endpoints
Proposer System : Documentation for DSPy, GEPA-AI, and Synth proposer modes
README : Created detailed READMEs in scan module and tunnel infrastructure with usage examples
Code Documentation : Added comprehensive docstrings across all scan module files
2025-11-11 β Session-Based Pricing & Budget Enforcement
π New Features
Session-Based Pricing : Comprehensive session-based usage tracking and budget enforcement for cost-incurring API requests.
Single Active Session : Only one active session per organization/user at a time, automatically managed
Automatic Session Attachment : All cost-incurring requests automatically attach to the active session unless explicitly opted out
Budget Enforcement : Requests are rejected with 429 when session limits are exceeded (tokens, cost, GPU hours)
Opt-Out Support : Set X-No-Session: true header to skip session tracking for specific requests
Session Lifecycle : Sessions have explicit beginning, active period, and end states
TOML Configuration : CLI commands (codex, opencode) support TOML config files with session limits (default: $20 cost limit)
Session Status Command : New synth status session command displays session details, usage, and limits
π Session Management
Auto-Creation : If no X-Session-ID header is provided, system automatically finds or creates an active session
Explicit Session ID : Provide X-Session-ID header to use a specific session
Session Limits : Configure hard limits on tokens, cost (USD), GPU hours, and API calls per session
Usage Tracking : All usage (tokens, cost, GPU hours) automatically recorded to the active session
Limit Violations : Sessions automatically marked as limit_exceeded when limits are breached
CLI Integration : uvx synth-ai codex and uvx synth-ai opencode automatically create sessions with limits from TOML config
π Budget Enforcement
Pre-Request Checks : Budget limits checked before processing requests to prevent unnecessary resource usage
Protected Routes : All cost-incurring routes (/api/v1/responses, /api/synth-research/chat/completions, etc.) enforce session budgets
Error Responses : Clear error messages with limit details when budgets are exceeded
Session Required : Requests require a session unless X-No-Session: true header is set
π Documentation
Session Pricing Spec : Complete specification document (session_pricing.spec.md) detailing session management behavior
API Documentation : Updated API endpoints with session management examples
SDK Integration : SDK supports session creation, querying, and usage tracking with full type safety
π§ Technical Improvements
Database Schema : New tables for session tracking (session_usage_sessions, session_usage_limits, session_usage_records, session_usage_limit_violations)
Service Layer : Comprehensive service layer for session CRUD, limit checking, and usage recording
Query Builder : Fluent query builders for both backend and SDK for session queries
Integration Tests : Full test coverage for session creation, budget checking, and usage tracking
Type Safety : Complete type annotations with all type checks passing
2025-11-11 β Cloudflare Tunnel Support for Task App Deployment
π New Features
Cloudflare Tunnel Deployment : Added support for deploying task apps via Cloudflare Tunnel, enabling RL training and prompt optimization without deploying to Modal.
Quick Tunnels : Free, ephemeral tunnels (no account required) perfect for development and testing. Background mode by default, returns immediately after deployment.
Managed Tunnels : Stable tunnels with custom subdomains (e.g., my-company.usesynth.ai) for production use. Requires SYNTH_API_KEY and backend provisioning (coming soon).
Seamless Integration : Works with RL training and prompt optimization (GEPA/MIPRO) - simply deploy via tunnel and use the URL in your training configs.
Clean Abstractions : Process management hidden from users - deploy and use, no helper scripts needed.
π Documentation
Deployment Guide : Updated CLI documentation with Cloudflare Tunnel deployment examples and use cases.
Example Workflow : Added examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh demonstrating GEPA prompt optimization with tunnel deployment.
π§ Technical Improvements
Background Mode : Tunnel deployments run in background by default (non-blocking), with optional --keep-alive flag for blocking mode.
Health Checks : Automatic health check polling ensures task app is ready before tunnel opens.
Credential Management : Automatic writing of TASK_APP_URL and Cloudflare Access credentials to .env files.
2025-11-09 β First-Class Codex Support for Synth Models
π New Features
Synth Model Support for Codex : Added first-class support for synth-small, synth-medium, and synth-experimental models in Codex CLI workflows.
Responses API Integration : Full Responses API format support with proper conversion to/from Chat Completions format for seamless Codex integration.
Tool Call Handling : Complete tool call processing with proper event sequencing, argument streaming, and output handling for Codexβs tool execution flow.
Stop Tool : Added __internal_stop tool to signal completion and prevent infinite loops in Responses β Chat β Responses conversion flows.
Stream Completion : Reliable stream completion signals with proper response.completed events and finish_reason handling.
π§ Technical Improvements
Responses API Bridge : Implemented comprehensive Responses API β Chat Completions β Responses API conversion layer for Codex compatibility.
Event Translation : Full SSE event translation between Responses and Chat formats, including tool calls, function outputs, and completion signals.
Comprehensive Testing : Added 32 integration tests covering all critical Responses API behavior, tool call processing, and edge cases.
2025-11-07 β Multi-Stage Optimizers & Expanded Model Support
π New Features
Multi-Stage MIPRO & GEPA : Both prompt optimization algorithms now support multi-stage pipeline optimization for complex workflows with multiple processing stages.
MIPRO Multi-Stage : Generates per-stage instruction proposals with automatic stage detection via LCS (Longest Common Subsequence) matching. Each stage gets stage-specific meta-prompts including pipeline overview, stage role, and baseline performance. Supports per-module configuration with max_instruction_slots and max_demo_slots for fine-grained control.
GEPA Multi-Stage : Uses module-aware evolution where each pipeline module gets its own gene. Mutations target specific modules, uniform crossover combines parent genes per module, and aggregated scoring sums module lengths for Pareto optimization. Supports per-module max_instruction_slots, max_tokens, and allowed_tools configuration.
Configuration : Both algorithms support pipeline_modules metadata in initial prompts and module-specific settings in their respective config sections (prompt_learning.gepa.modules and prompt_learning.mipro.modules).
Gemini Model Support : Added comprehensive support for Google Gemini models as policy models for both GEPA and MIPRO algorithms.
Supported Models : gemini-2.5-pro (β€200k tokens), gemini-2.5-pro-gt200k (>200k tokens), gemini-2.5-flash, and gemini-2.5-flash-lite.
Provider Integration : Full SDK validation and backend support for provider = "google" with automatic pricing calculation and token tracking.
Example Configs : Added example configurations demonstrating Gemini usage, including banking77_pipeline_mipro_gemini_flash_lite_local.toml for cost-effective multi-stage optimization.
OpenAI Model Support : Expanded OpenAI model support for prompt optimization with comprehensive coverage of latest models.
Supported Models : gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, and gpt-5-nano.
Model Validation : SDK-side validation with clear error messages for unsupported models. Explicit rejection of gpt-5-pro due to high cost (15 / 15/ 15/ 120 per 1M tokens).
Provider Prefix Support : Models can be specified with or without provider prefix (e.g., "gpt-4o" or "openai/gpt-4o").
SDK Validation Enhancements : Improved config validation with comprehensive error checking before sending to backend.
Multi-Stage Validation : Validates that pipeline_modules match module configs, checks for missing or extra modules, and ensures proper module ID matching.
Model Validation : Provider-aware model validation with detailed error messages listing supported models for each provider.
Nano Model Restrictions : Clear validation that nano models (gpt-4.1-nano, gpt-5-nano) are allowed for policy models but rejected for mutation/meta models (too small for generation tasks).
π§ Technical Improvements
Config Parsing : Enhanced TOML parsing for multi-stage configurations with support for nested module and stage definitions.
Integration Tests : Added comprehensive integration tests for multi-stage GEPA and MIPRO workflows, including Gemini model validation tests.
Error Messages : Improved validation error messages with actionable suggestions and links to example configurations.
π Documentation
Multi-Stage Pipeline Guide : Updated documentation with examples and configuration details for optimizing multi-stage pipelines with both GEPA and MIPRO algorithms.
Model Support Reference : Complete documentation of supported models for each provider (OpenAI, Groq, Google) with usage examples.
Example Configurations : Added example configs demonstrating multi-stage optimization with different model providers, including multi_stage_gepa_example.toml and banking77_pipeline_mipro_gemini_flash_lite_local.toml.
2025-11-04 β GEPA: Genetic Evolution for Prompt Optimization
π New Features
GEPA Algorithm : Genetic Evolution for Prompt Optimization (GEPA) is now available for prompt optimization jobs. GEPA uses evolutionary algorithms (mutation, crossover, selection) to optimize prompts across multiple generations, achieving significant accuracy improvements on classification and reasoning tasks.
Prompt ID-Based URLs : Prompt transformations now use versioned URLs (/v1/{prompt_version_id}/chat/completions) for better traceability, concurrency, and debugging. Each transformation gets a unique version ID based on content hashing.
Multi-Objective Optimization : GEPA maintains a Pareto front balancing accuracy, token count, and task-specific metrics (e.g., tool call rate).
Validation Scoring : Job results now distinguish between prompt_best_train_score and prompt_best_validation_score for clearer evaluation metrics.
Integration Testing : Added comprehensive integration tests for GEPA training workflows with Banking77 task app.
π Documentation
GEPA Guide : Complete documentation with quick start, configuration examples, and troubleshooting for Banking77, HotpotQA, IFBench, HoVer, and PUPA tasks.
Integration Examples : Step-by-step guides for deploying task apps and running GEPA optimization locally and on Modal.
2025-10-28 β Terminal Training Logs
π New Features
Full terminal streaming logs : Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.
2025-10-27 β Rubrics, Hosted Judges & Qwen-VL RL
π New Features
Hosted Synth judges (configurable) : Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
Rubric-aware filtering : SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
Qwen-VL support across SFT & RL : Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
Instruct-model RL guidance : Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.
2025-10-17 β Qwen Coder, Turso, H200 Topologies & RL Throughput
π New Features
Qwen Coder models supported : Qwen Coder variants are now available across SFT and inference workflows.
SDK migrated to Turso for concurrency : Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
More training topologies on H200s : Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
Full LoRA support for Policy Gradient : LoRA integrated end-to-end into Policy Gradient training flows.
Pipelined RL async rollouts : Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.
2025-10-09 β LoRA, MoE & Large Model Support
π New Features
Expanded Qwen catalog : Simple Training now ships SFT and inference presets for every Qwen release outside the existing qwen3-{0.6Bβ32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
Large-model inference & training topologies : Added 2Γ, 4Γ, and 8Γ layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
Turnkey rollout : API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
LoRA-first SFT : Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.
π New Features
Rollout Viewer : Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
B200 & H200 GPU Support : Added support for NVIDIAβs latest flagship GPUs (B200, H200) for both training and inference workloads
Faster Inference : Optimized inference pipeline with improved throughput and reduced latency across all model sizes
GSPO Support : Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training
2025-09-17 β Online RL (customerβvisible features)
Organizationβscoped environment credentials
Upload your environment API key once (sealedβbox encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
Firstβparty Task App integration
Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
Singleβnode, multiβGPU Online RL
Outβofβtheβbox split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
Production run flow
Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.
0.2.2.dev2 β Aug 8, 2025
Fine-tuning (SFT) endpoints available and documented end-to-end
Interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B
Live polling output during training with real-time status updates
CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher
0.2.2.dev1 β Aug 7, 2025
New backend balance APIs and CLI for account visibility
CLI utilities: balance, traces, and man commands
Traces inventory view with per-DB counts and storage footprint
Standardized one-off usage: uvx synth-ai <command> (removed interactive watch)
Improved .env loading and API key resolution
0.2.2.dev0 β Jul 30, 2025
Environment Registration API for custom environments
Turso/sqld daemon support with local-first replicas
Environment Service Daemon via uvx synth-ai serve
0.2.1.dev1 β Jul 29, 2025
Initial development release
Feb 3, 2025
Cuvier Error Search (deprecated)
Jan 2025
Langsmith integration for Enterprise partners
Python SDK v0.3 (simplified API, Anthropic support)