Skip to main content

2025-11-17 – SDK Release 0.2.25.dev1

πŸ“¦ Package Updates

  • Version Bump: Updated synth-ai package to 0.2.25.dev1
  • Modal Deployment: Updated default SYNTH_AI_VERSION in Modal deployments to 0.2.25.dev1

2025-11-17 – Vendored Prompt Learning: Production-Ready Examples

πŸš€ New Features

Production Prompt Optimization Examples (vendored_prompt_learning)

  • Complete Pipeline Examples: New production-ready examples demonstrating prompt optimization on the fly in production environments
    • GEPA Pipeline: run_gepa_example.py - Complete GEPA optimization workflow from baseline evaluation to final prompt deployment
    • MIPRO Pipeline: run_mipro_example.py - Complete MIPRO optimization workflow with programmatic polling and progress tracking
    • In-Process Task Apps: Automatic task app management with Cloudflare tunnel support for production deployments
    • Self-Contained Scripts: Everything in one script - no external dependencies or manual setup required

Production Integration Features

  • In-Process Task App Management: InProcessTaskApp utility automatically manages FastAPI servers and Cloudflare tunnels
    • Automatic Tunnel Creation: Opens Cloudflare tunnels automatically for production use
    • Background Server Management: Runs task apps in background threads with graceful shutdown
    • Port Conflict Resolution: Automatically finds available ports if requested port is busy
  • Programmatic Polling: Built-in job status polling with progress callbacks and timeout handling
  • Prompt Retrieval: Easy extraction of optimized prompts from completed jobs
  • Baseline & Final Evaluation: Complete evaluation pipeline comparing initial vs optimized prompts

πŸ“š Documentation

  • Production Guide: Comprehensive guide at /blog/prompt-optimization-benchmarks explaining production use cases
  • Examples Directory: Complete examples in examples/blog_posts/vendored_prompt_learning/
    • Full pipeline scripts for GEPA and MIPRO
    • Configuration files for all benchmarks (Banking77, HeartDisease, HotpotQA, Pupa)
    • In-process scripts with minimal budgets for quick testing
  • README: Detailed README with setup instructions, architecture overview, and customization options

🎯 Use Cases

  • A/B Testing: Automatically find better prompts for your use case without manual intervention
  • Performance Tuning: Continuously improve prompt performance as your data changes
  • Multi-Tenant Optimization: Optimize prompts per customer or use case
  • Rapid Iteration: Test and deploy better prompts faster than manual tuning

πŸ”§ Technical Improvements

  • Unified Benchmark Suite: Consolidated all GEPA and MIPRO examples from blog_posts/gepa/ and blog_posts/mipro/ into a single directory
  • Path Fixes: Fixed environment variable loading and task app path resolution for reliable execution
  • Budget Controls: Configurable rollout budgets with minimal budget modes for quick testing (~1 minute)
  • Error Handling: Comprehensive error handling with clear messages and fallback behavior

2025-11-14 – Artifacts CLI

πŸš€ New Features

Artifacts Management (synth-ai artifacts)

  • Unified Artifact Management: New synth-ai artifacts command suite for managing and inspecting all Synth AI artifacts
    • List Artifacts: artifacts list displays all fine-tuned models, RL models, and optimized prompts with filtering options
    • Show Details: artifacts show provides detailed information about specific artifacts, with intelligent prompt extraction
    • Export Models: artifacts export exports models to HuggingFace Hub (private or public)
    • Download Prompts: artifacts download downloads optimized prompts in JSON, YAML, or text formats
    • Smart ID Parsing: Centralized parsing logic handles all artifact ID formats (ft:, rl:, peft:, pl_)
    • Best Prompt Display: Default view shows job summary + best optimized prompt with syntax highlighting
    • Verbose Mode: --verbose flag shows full metadata and snapshot details for prompts
    • JSON Export: --format json enables easy scripting and data export

Prompt Extraction Intelligence

  • Multi-Structure Support: Automatically extracts prompt messages from various snapshot structures:
    • Direct messages arrays
    • object β†’ messages arrays
    • object β†’ text_replacements arrays (GEPA structure)
    • initial_prompt β†’ data β†’ messages structures
  • Syntax Highlighting: Rich formatting with role-based colors (system/user/assistant)
  • Default View: Shows only the most important information (summary + best prompt)
  • Verbose View: Full metadata and snapshot details when needed

πŸ”§ Technical Improvements

  • Centralized Parsing: All artifact ID parsing logic consolidated in synth_ai.cli.commands.artifacts.parsing
    • Comprehensive validation and error handling
    • Support for special characters, unicode, whitespace handling
    • Type detection and validation utilities
  • Backend Integration: New backend endpoints in /api/artifacts:
    • GET /api/artifacts - List all artifacts with filtering
    • GET /api/artifacts/models/{model_id} - Get model details
    • GET /api/artifacts/prompts/{job_id} - Get prompt details with metadata extraction
    • POST /api/learning/exports/hf - Export to HuggingFace
  • Error Handling: Clear error messages for invalid IDs, missing artifacts, authentication failures
  • Test Coverage: Comprehensive test suite with 59+ tests covering:
    • All parsing edge cases (whitespace, unicode, special chars, empty values, etc.)
    • All detection and validation functions
    • Wasabi key resolution
    • CLI command behavior with mocked clients
    • Integration tests for real backend calls

πŸ“š Documentation

  • CLI Documentation: Complete CLI reference page at /cli/artifacts
  • README: Comprehensive README in synth-ai/synth_ai/cli/commands/artifacts/README.md
  • Code Examples: Usage examples for all subcommands and common workflows

🎯 Use Cases

  • Artifact Discovery: Quickly find and inspect all your trained models and optimized prompts
  • Prompt Extraction: Easily extract and use optimized prompts from completed jobs
  • Model Sharing: Export models to HuggingFace for sharing or deployment
  • Scripting: JSON output enables automation and integration with other tools
  • Debugging: Verbose mode helps troubleshoot job issues and inspect metadata

2025-11-14 – In-Process Task App Utility

πŸš€ New Features

In-Process Task App (InProcessTaskApp)

  • Automatic Server & Tunnel Management: New InProcessTaskApp utility enables running task apps entirely within your Python script
    • Background Server: Automatically starts uvicorn server in background thread
    • Automatic Tunnel Creation: Opens Cloudflare quick tunnel automatically and provides tunnel URL
    • Port Conflict Handling: Automatically finds available ports if requested port is busy (auto_find_port=True by default)
    • Multiple Input Methods: Supports FastAPI app instance, TaskAppConfig object, config factory function, or task app file path
    • Signal Handling: Graceful shutdown on SIGINT/SIGTERM prevents orphaned processes
    • Observability: Structured logging and optional on_start/on_stop callbacks
    • Input Validation: Comprehensive validation with clear error messages for port range, host security, tunnel mode, and file existence

Public Cloudflare Utilities

  • Public API Functions: Made start_uvicorn_background() and wait_for_health_check() public in synth_ai.cloudflare
    • Can be used independently for custom server management
    • Previously private functions now available for advanced use cases

πŸ”§ Technical Improvements

  • Port Management: Smart port conflict resolution with automatic port finding
    • _find_available_port() helper searches for available ports starting from requested port
    • _is_port_available() helper checks port availability
    • _kill_process_on_port() helper attempts to free occupied ports (best-effort, cross-platform)
  • Error Handling: Comprehensive error handling with actionable error messages
  • Test Coverage: Full test suite with unit tests (383 lines) and integration tests (145 lines)
  • Security: Host validation restricts to localhost addresses for security

πŸ“š Documentation

  • SDK Documentation: Complete SDK reference page at /sdk/in-process-task-app
  • README: Comprehensive README in synth-ai/IN_PROCESS_TASK_APP_README.md with usage examples
  • Code Examples: Multiple usage examples including GEPA integration, callbacks, and port handling

🎯 Use Cases

  • Local Development: Run prompt optimization without separate terminal processes
  • Demos & Scripts: Simplify demo scripts and automation workflows
  • CI/CD: Integrate task apps into automated testing and deployment pipelines
  • Resource Management: Automatic cleanup prevents resource leaks

2025-11-14 – Experiment Queue System

πŸš€ New Features

Experiment Queue Management (synth-ai queue)

  • Queue Worker Management: New synth-ai queue command suite for managing Celery workers that process experiment jobs
    • Start Worker: synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
    • Stop Worker: synth-ai queue stop terminates all running experiment queue workers
    • Status Check: synth-ai queue status (or synth-ai queue) displays current worker status and configuration
    • Single Worker Enforcement: Automatically kills existing workers before starting new ones to ensure only one instance runs
    • Database Path Validation: Enforces consistent database path usage across all workers
    • Background Mode: Workers run in background by default with logs written to logs/experiment_queue_worker.log
    • Beat Scheduler: Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs

Experiment Submission (synth-ai experiment)

  • Submit Experiments: synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
    • JSON Payloads: Submit from files or inline JSON strings
    • Job Validation: Validates config files and environment files before submission
    • Automatic Dispatch: Initial jobs are dispatched immediately upon submission
    • Parallelism Control: Configure how many jobs run concurrently per experiment
  • Experiment Management: synth-ai experiment list and synth-ai experiments provide dashboard views
    • Status Filtering: Filter experiments by status (QUEUED, RUNNING, COMPLETED, FAILED, CANCELED)
    • Watch Mode: Continuous refresh with --watch flag for real-time monitoring
    • JSON Output: Machine-readable JSON output for automation and scripting

Redis-Based Queue Backend

  • Redis Broker: Migrated from SQLite broker to Redis for improved reliability and performance
    • Default Configuration: Uses redis://localhost:6379/0 for broker and redis://localhost:6379/1 for result backend
    • Environment Variables: Configurable via EXPERIMENT_QUEUE_BROKER_URL and EXPERIMENT_QUEUE_RESULT_BACKEND_URL
    • No Locking Issues: Eliminates SQLite locking conflicts that prevented job dispatch
    • Concurrent Access: Supports multiple workers and concurrent job processing

πŸ”§ Technical Improvements

  • Database Architecture: SQLite database for experiment metadata, Redis for Celery broker/backend
    • WAL Mode: Application database uses Write-Ahead Logging for concurrent reads/writes
    • Path Enforcement: Single database path requirement prevents configuration conflicts
    • Automatic Initialization: Database schema created automatically on first use
  • Periodic Task System: Celery Beat scheduler automatically dispatches queued jobs every 5 seconds
  • Error Handling: Comprehensive error reporting with detailed diagnostics for authentication failures, health check failures, and silent failures
  • Result Parsing: Enhanced result collection supports both JSON and text-based outputs (GEPA/MIPRO)
  • Test Suite: Fast, comprehensive test suite (42 tests, ~2 seconds) with proper isolation and timeout handling

πŸ“š Documentation

  • CLI Documentation: Complete documentation for synth-ai queue and synth-ai experiment commands
  • README: Comprehensive README in synth_ai/experiment_queue/ with usage examples and troubleshooting
  • Environment Variables: Documented required and optional environment variables for queue configuration

🎯 Use Cases

  • Local Development: Run prompt learning experiments locally without backend API
  • Batch Processing: Submit multiple experiments and let the queue process them automatically
  • Resource Management: Control parallelism to manage resource usage
  • Monitoring: Real-time status monitoring with watch mode and JSON output for dashboards

2025-11-14 – Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements

πŸš€ New Features

Task App Discovery and Health Checking (synth-ai scan)

  • New Command: Added synth-ai scan command to discover and health-check running task applications
    • Multi-Method Discovery: Uses port scanning, service records, tunnel records, process inspection, backend API, and registry queries
    • Health Checks: Performs HTTP health checks on /health endpoints and extracts metadata from /info endpoints
    • Dual Output Formats: Supports both human-readable table format and machine-readable JSON format
    • Service Records Integration: Automatically discovers apps deployed via synth-ai deploy --runtime local
    • Tunnel Discovery: Discovers Cloudflare tunnels deployed via synth-ai deploy --runtime tunnel
    • Process Scanning: Inspects running cloudflared processes to find tunnel URLs
    • API Key Support: Supports authentication via X-API-Key header for protected apps

Backend Cloudflare Tunnel Infrastructure

  • Complete Tunnel Management API: Full backend infrastructure for managing Cloudflare tunnels
    • REST API Endpoints: New endpoints for tunnel CRUD operations (POST /api/v1/tunnels, GET /api/v1/tunnels, GET /api/v1/tunnels/tunnel, DELETE /api/v1/tunnels/{id})
    • Tunnel Manager Service: Lifecycle management with automatic tunnel replacement and Cloudflare Access credential support
    • Database Models: Complete tunnel schema with support for access_client_id and access_client_secret for Cloudflare Access integration
    • Database Migrations: Removed hostname unique constraint, added partial index for active tunnels
    • Deployment Scripts: Production-ready tunnel deployment utilities (run_backend_tunnel.sh, start_tunnel.py)
    • Comprehensive Testing: Full test suite for tunnel creation, management, and cleanup

Unified Artifacts API

  • Backend Artifacts Endpoints: New REST API for managing all Synth AI artifacts
    • List Artifacts: GET /api/artifacts - Unified view of fine-tuned models, RL models, and optimized prompts with filtering
    • Model Details: GET /api/artifacts/models/{model_id} - Get detailed model information
    • Prompt Details: GET /api/artifacts/prompts/{job_id} - Get prompt details with intelligent metadata extraction
    • Smart Prompt Extraction: Automatically extracts prompts from various snapshot structures (direct messages, GEPA text_replacements, nested structures)
    • Backend Integration: Connected to existing learning jobs and RL registries for unified artifact management

Prompt Optimization Proposer Backends

  • Modular Proposer System: New pluggable proposer architecture for GEPA and MIPRO prompt optimization
    • DSPy Proposer: DSPy-inspired instruction proposer (proposer_mode="dspy") - Standard DSPy-style mutation prompts for prompt generation
    • GEPA-AI Proposer: GEPA-AI inspired proposer (proposer_mode="gepa-ai") - Alternative instruction generation approach with GEPA-AI flavored prompts
    • Synth Proposer: Synth Research variant (proposer_mode="synth") - Synth-branded proposer currently mirroring DSPy behavior
    • Built-in Proposers: Base proposer implementations for both GEPA and MIPRO algorithms
    • Configuration: Per-proposer configuration with separate meta-model settings, temperature, and token limits
    • Backward Compatibility: Auto-migration from deprecated β€œbuiltin” mode to β€œdspy” mode with deprecation warnings

OpenAI & Groq Provider Support

  • Enhanced Provider Integration: Comprehensive support for OpenAI and Groq models in prompt optimization
    • OpenAI Models: Full support for GPT-4o, GPT-4.1, GPT-5 families with automatic pricing calculation
    • Groq Models: Support for Llama, Mixtral, and Gemma models via Groq’s OpenAI-compatible API
    • Provider URL Resolution: Centralized provider URL management (get_provider_url()) for consistent API endpoint resolution
    • Pricing Integration: Per-token pricing tables for accurate cost tracking and budget enforcement
    • Environment Overrides: Runtime pricing override via PL_TOKEN_RATES_JSON environment variable

Experiment Queue System (synth-ai queue & synth-ai experiment)

  • Queue Worker Management: New synth-ai queue command suite for managing Celery workers that process experiment jobs
    • Start Worker: synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
    • Stop Worker: synth-ai queue stop terminates all running experiment queue workers
    • Status Check: synth-ai queue status (or synth-ai queue) displays current worker status and configuration
    • Single Worker Enforcement: Automatically kills existing workers before starting new ones to ensure only one instance runs
    • Background Mode: Workers run in background by default with logs written to logs/experiment_queue_worker.log
    • Beat Scheduler: Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs
  • Experiment Submission: synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
    • JSON Payloads: Submit from files or inline JSON strings
    • Job Validation: Validates config files and environment files before submission
    • Automatic Dispatch: Initial jobs are dispatched immediately upon submission
    • Parallelism Control: Configure how many jobs run concurrently per experiment
  • Experiment Management: synth-ai experiment list provides dashboard views with status filtering and watch mode
  • Redis-Based Queue Backend: Migrated from SQLite broker to Redis for improved reliability and performance
    • No Locking Issues: Eliminates SQLite locking conflicts that prevented job dispatch
    • Concurrent Access: Supports multiple workers and concurrent job processing

Deploy Command Enhancements

  • Non-Blocking by Default: Deploy command now runs in background mode by default to prevent indefinite stalls
    • Local Deployments: Uses nohup to run uvicorn servers in background
    • Tunnel Deployments: Returns immediately after starting tunnel (non-blocking)
    • Modal Deployments: Starts process and returns immediately (non-blocking)
  • Optional Blocking Mode: Added --wait flag for interactive use when blocking behavior is desired
  • API Key Validation: Enhanced validation to ensure SYNTH_API_KEY and ENVIRONMENT_API_KEY are present before deployment
    • Checks both environment variables and provided .env file
    • Provides detailed error messages showing where keys were found or missing
    • Prevents deployments without required credentials

πŸ”§ Technical Improvements

  • Service Record Management: Local and tunnel deployments automatically record to ~/.synth-ai/services.json
    • Records include PID, URL, port, app_id, and deployment metadata
    • Automatic cleanup of stale records (dead processes)
  • Health Check Logic: Robust health check implementation with timeout handling and error detection
  • Performance: Uses asyncio for parallel health checks and port scanning
  • Comprehensive Testing: 33 tests total (19 unit, 11 integration, 3 end-to-end) with proper timeout handling

πŸ“š Documentation

  • CLI Documentation: Added comprehensive scan command documentation
  • Tunnel Documentation: Complete guides for backend tunnel setup, deployment, and troubleshooting
  • Artifacts API: Backend API documentation for artifact management endpoints
  • Proposer System: Documentation for DSPy, GEPA-AI, and Synth proposer modes
  • README: Created detailed READMEs in scan module and tunnel infrastructure with usage examples
  • Code Documentation: Added comprehensive docstrings across all scan module files

2025-11-11 – Session-Based Pricing & Budget Enforcement

πŸš€ New Features

  • Session-Based Pricing: Comprehensive session-based usage tracking and budget enforcement for cost-incurring API requests.
    • Single Active Session: Only one active session per organization/user at a time, automatically managed
    • Automatic Session Attachment: All cost-incurring requests automatically attach to the active session unless explicitly opted out
    • Budget Enforcement: Requests are rejected with 429 when session limits are exceeded (tokens, cost, GPU hours)
    • Opt-Out Support: Set X-No-Session: true header to skip session tracking for specific requests
    • Session Lifecycle: Sessions have explicit beginning, active period, and end states
    • TOML Configuration: CLI commands (codex, opencode) support TOML config files with session limits (default: $20 cost limit)
    • Session Status Command: New synth status session command displays session details, usage, and limits

πŸ“Š Session Management

  • Auto-Creation: If no X-Session-ID header is provided, system automatically finds or creates an active session
  • Explicit Session ID: Provide X-Session-ID header to use a specific session
  • Session Limits: Configure hard limits on tokens, cost (USD), GPU hours, and API calls per session
  • Usage Tracking: All usage (tokens, cost, GPU hours) automatically recorded to the active session
  • Limit Violations: Sessions automatically marked as limit_exceeded when limits are breached
  • CLI Integration: uvx synth-ai codex and uvx synth-ai opencode automatically create sessions with limits from TOML config

πŸ”’ Budget Enforcement

  • Pre-Request Checks: Budget limits checked before processing requests to prevent unnecessary resource usage
  • Protected Routes: All cost-incurring routes (/api/v1/responses, /api/synth-research/chat/completions, etc.) enforce session budgets
  • Error Responses: Clear error messages with limit details when budgets are exceeded
  • Session Required: Requests require a session unless X-No-Session: true header is set

πŸ“š Documentation

  • Session Pricing Spec: Complete specification document (session_pricing.spec.md) detailing session management behavior
  • API Documentation: Updated API endpoints with session management examples
  • SDK Integration: SDK supports session creation, querying, and usage tracking with full type safety

πŸ”§ Technical Improvements

  • Database Schema: New tables for session tracking (session_usage_sessions, session_usage_limits, session_usage_records, session_usage_limit_violations)
  • Service Layer: Comprehensive service layer for session CRUD, limit checking, and usage recording
  • Query Builder: Fluent query builders for both backend and SDK for session queries
  • Integration Tests: Full test coverage for session creation, budget checking, and usage tracking
  • Type Safety: Complete type annotations with all type checks passing

2025-11-11 – Cloudflare Tunnel Support for Task App Deployment

πŸš€ New Features

  • Cloudflare Tunnel Deployment: Added support for deploying task apps via Cloudflare Tunnel, enabling RL training and prompt optimization without deploying to Modal.
    • Quick Tunnels: Free, ephemeral tunnels (no account required) perfect for development and testing. Background mode by default, returns immediately after deployment.
    • Managed Tunnels: Stable tunnels with custom subdomains (e.g., my-company.usesynth.ai) for production use. Requires SYNTH_API_KEY and backend provisioning (coming soon).
    • Seamless Integration: Works with RL training and prompt optimization (GEPA/MIPRO) - simply deploy via tunnel and use the URL in your training configs.
    • Clean Abstractions: Process management hidden from users - deploy and use, no helper scripts needed.

πŸ“š Documentation

  • Deployment Guide: Updated CLI documentation with Cloudflare Tunnel deployment examples and use cases.
  • Example Workflow: Added examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh demonstrating GEPA prompt optimization with tunnel deployment.

πŸ”§ Technical Improvements

  • Background Mode: Tunnel deployments run in background by default (non-blocking), with optional --keep-alive flag for blocking mode.
  • Health Checks: Automatic health check polling ensures task app is ready before tunnel opens.
  • Credential Management: Automatic writing of TASK_APP_URL and Cloudflare Access credentials to .env files.

2025-11-09 – First-Class Codex Support for Synth Models

πŸš€ New Features

  • Synth Model Support for Codex: Added first-class support for synth-small, synth-medium, and synth-experimental models in Codex CLI workflows.
    • Responses API Integration: Full Responses API format support with proper conversion to/from Chat Completions format for seamless Codex integration.
    • Tool Call Handling: Complete tool call processing with proper event sequencing, argument streaming, and output handling for Codex’s tool execution flow.
    • Stop Tool: Added __internal_stop tool to signal completion and prevent infinite loops in Responses β†’ Chat β†’ Responses conversion flows.
    • Stream Completion: Reliable stream completion signals with proper response.completed events and finish_reason handling.

πŸ”§ Technical Improvements

  • Responses API Bridge: Implemented comprehensive Responses API β†’ Chat Completions β†’ Responses API conversion layer for Codex compatibility.
  • Event Translation: Full SSE event translation between Responses and Chat formats, including tool calls, function outputs, and completion signals.
  • Comprehensive Testing: Added 32 integration tests covering all critical Responses API behavior, tool call processing, and edge cases.

2025-11-07 – Multi-Stage Optimizers & Expanded Model Support

πŸš€ New Features

  • Multi-Stage MIPRO & GEPA: Both prompt optimization algorithms now support multi-stage pipeline optimization for complex workflows with multiple processing stages.
    • MIPRO Multi-Stage: Generates per-stage instruction proposals with automatic stage detection via LCS (Longest Common Subsequence) matching. Each stage gets stage-specific meta-prompts including pipeline overview, stage role, and baseline performance. Supports per-module configuration with max_instruction_slots and max_demo_slots for fine-grained control.
    • GEPA Multi-Stage: Uses module-aware evolution where each pipeline module gets its own gene. Mutations target specific modules, uniform crossover combines parent genes per module, and aggregated scoring sums module lengths for Pareto optimization. Supports per-module max_instruction_slots, max_tokens, and allowed_tools configuration.
    • Configuration: Both algorithms support pipeline_modules metadata in initial prompts and module-specific settings in their respective config sections (prompt_learning.gepa.modules and prompt_learning.mipro.modules).
  • Gemini Model Support: Added comprehensive support for Google Gemini models as policy models for both GEPA and MIPRO algorithms.
    • Supported Models: gemini-2.5-pro (≀200k tokens), gemini-2.5-pro-gt200k (>200k tokens), gemini-2.5-flash, and gemini-2.5-flash-lite.
    • Provider Integration: Full SDK validation and backend support for provider = "google" with automatic pricing calculation and token tracking.
    • Example Configs: Added example configurations demonstrating Gemini usage, including banking77_pipeline_mipro_gemini_flash_lite_local.toml for cost-effective multi-stage optimization.
  • OpenAI Model Support: Expanded OpenAI model support for prompt optimization with comprehensive coverage of latest models.
    • Supported Models: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, and gpt-5-nano.
    • Model Validation: SDK-side validation with clear error messages for unsupported models. Explicit rejection of gpt-5-pro due to high cost (15/15/120 per 1M tokens).
    • Provider Prefix Support: Models can be specified with or without provider prefix (e.g., "gpt-4o" or "openai/gpt-4o").
  • SDK Validation Enhancements: Improved config validation with comprehensive error checking before sending to backend.
    • Multi-Stage Validation: Validates that pipeline_modules match module configs, checks for missing or extra modules, and ensures proper module ID matching.
    • Model Validation: Provider-aware model validation with detailed error messages listing supported models for each provider.
    • Nano Model Restrictions: Clear validation that nano models (gpt-4.1-nano, gpt-5-nano) are allowed for policy models but rejected for mutation/meta models (too small for generation tasks).

πŸ”§ Technical Improvements

  • Config Parsing: Enhanced TOML parsing for multi-stage configurations with support for nested module and stage definitions.
  • Integration Tests: Added comprehensive integration tests for multi-stage GEPA and MIPRO workflows, including Gemini model validation tests.
  • Error Messages: Improved validation error messages with actionable suggestions and links to example configurations.

πŸ“š Documentation

  • Multi-Stage Pipeline Guide: Updated documentation with examples and configuration details for optimizing multi-stage pipelines with both GEPA and MIPRO algorithms.
  • Model Support Reference: Complete documentation of supported models for each provider (OpenAI, Groq, Google) with usage examples.
  • Example Configurations: Added example configs demonstrating multi-stage optimization with different model providers, including multi_stage_gepa_example.toml and banking77_pipeline_mipro_gemini_flash_lite_local.toml.

2025-11-04 – GEPA: Genetic Evolution for Prompt Optimization

πŸš€ New Features

  • GEPA Algorithm: Genetic Evolution for Prompt Optimization (GEPA) is now available for prompt optimization jobs. GEPA uses evolutionary algorithms (mutation, crossover, selection) to optimize prompts across multiple generations, achieving significant accuracy improvements on classification and reasoning tasks.
  • Prompt ID-Based URLs: Prompt transformations now use versioned URLs (/v1/{prompt_version_id}/chat/completions) for better traceability, concurrency, and debugging. Each transformation gets a unique version ID based on content hashing.
  • Multi-Objective Optimization: GEPA maintains a Pareto front balancing accuracy, token count, and task-specific metrics (e.g., tool call rate).
  • Validation Scoring: Job results now distinguish between prompt_best_train_score and prompt_best_validation_score for clearer evaluation metrics.
  • Integration Testing: Added comprehensive integration tests for GEPA training workflows with Banking77 task app.

πŸ“š Documentation

  • GEPA Guide: Complete documentation with quick start, configuration examples, and troubleshooting for Banking77, HotpotQA, IFBench, HoVer, and PUPA tasks.
  • Integration Examples: Step-by-step guides for deploying task apps and running GEPA optimization locally and on Modal.

2025-10-28 – Terminal Training Logs

πŸš€ New Features

  • Full terminal streaming logs: Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

πŸš€ New Features

  • Hosted Synth judges (configurable): Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
  • Rubric-aware filtering: SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
  • Qwen-VL support across SFT & RL: Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
  • Instruct-model RL guidance: Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

πŸš€ New Features

  • Qwen Coder models supported: Qwen Coder variants are now available across SFT and inference workflows.
  • SDK migrated to Turso for concurrency: Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
  • More training topologies on H200s: Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
  • Full LoRA support for Policy Gradient: LoRA integrated end-to-end into Policy Gradient training flows.
  • Pipelined RL async rollouts: Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.

2025-10-09 – LoRA, MoE & Large Model Support

πŸš€ New Features

  • Expanded Qwen catalog: Simple Training now ships SFT and inference presets for every Qwen release outside the existing qwen3-{0.6B–32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
  • Large-model inference & training topologies: Added 2Γ—, 4Γ—, and 8Γ— layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
  • Turnkey rollout: API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
  • LoRA-first SFT: Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.

2025-09-24 – Platform Updates

πŸš€ New Features

  • Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
  • B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads
  • Faster Inference: Optimized inference pipeline with improved throughput and reduced latency across all model sizes
  • GSPO Support: Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training

2025-09-17 – Online RL (customer‑visible features)

  • Organization‑scoped environment credentials
    • Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
  • First‑party Task App integration
    • Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
  • Single‑node, multi‑GPU Online RL
    • Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
    • Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
  • Production run flow
    • Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

0.2.2.dev2 β€” Aug 8, 2025

  • Fine-tuning (SFT) endpoints available and documented end-to-end
  • Interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B
  • Live polling output during training with real-time status updates
  • CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher

0.2.2.dev1 β€” Aug 7, 2025

  • New backend balance APIs and CLI for account visibility
  • CLI utilities: balance, traces, and man commands
  • Traces inventory view with per-DB counts and storage footprint
  • Standardized one-off usage: uvx synth-ai <command> (removed interactive watch)
  • Improved .env loading and API key resolution

0.2.2.dev0 β€” Jul 30, 2025

  • Environment Registration API for custom environments
  • Turso/sqld daemon support with local-first replicas
  • Environment Service Daemon via uvx synth-ai serve

0.2.1.dev1 β€” Jul 29, 2025

  • Initial development release

Feb 3, 2025

  • Cuvier Error Search (deprecated)

Jan 2025

  • Langsmith integration for Enterprise partners
  • Python SDK v0.3 (simplified API, Anthropic support)