Roadmap & Changelog

2025-11-22 – Language-Agnostic Prompt Optimization via OpenAPI Contracts

🚀 New Features

Polyglot Task App Support

Build Task Apps in Any Language: Prompt optimization now works with Task Apps written in any programming language, not just Python
- OpenAPI Contract: Complete OpenAPI 3.1 specification (synth_ai/contracts/task_app.yaml) defines the HTTP interface between your Task App and Synth’s optimizers
- No Python Required: Implement the contract in Rust, Go, TypeScript, Zig, or any language that can serve HTTP requests
- CLI Access: Use synth contracts show task-app or synth contracts path task-app to access the contract specification
- Code Generation: Generate type-safe bindings for your language using openapi-generator with the contract file

Complete Polyglot Examples

Rust Example: Fast, type-safe implementation using Axum framework
- Async/await with Tokio runtime
- Tested end-to-end with MIPRO optimizer (100% accuracy on Banking77)
- Single static binary deployment
Go Example: Zero external dependencies, uses only Go standard library
- Built-in cross-compilation support
- Excellent concurrency with goroutines
- Single static binary (~8-12MB)
TypeScript Example: Works with Node.js, Deno, Bun, and Cloudflare Workers
- Uses Hono framework for fast, lightweight HTTP
- Edge deployment ready (Cloudflare Workers support)
- Full TypeScript type safety
Zig Example: Minimal binaries (~1-5MB), trivial cross-compilation
- Zero runtime dependencies
- No garbage collection (predictable latency)
- Cross-compile to any target from any host

OpenAPI ↔ Pydantic Validation

Automated Schema Validation: GitHub Actions CI automatically validates that OpenAPI spec stays in sync with Python Pydantic models
- Prevents Drift: Catches schema mismatches before they cause integration issues
- Comprehensive Testing: 19 unit tests ensure contract integrity
- CLI Integration: Validation script available via scripts/validate_openapi_pydantic.py

📚 Documentation

Polyglot Task Apps Guide: Complete guide at /prompt-optimization/polyglot-task-apps with:
- Language-specific examples and quick starts
- Authentication flow documentation (Task App auth vs LLM API auth)
- URL construction best practices (handling query parameters)
- Code generation examples for all supported languages
- Performance characteristics comparison
Contracts CLI Documentation: New CLI reference at /cli/contracts explaining:
- How to access OpenAPI contracts via CLI
- Code generation workflows
- Contract validation and integration patterns
Updated README: Added prominent “Language-Agnostic” section emphasizing that Synth works with any language

🔧 Technical Improvements

Contract Module: New synth_ai/contracts/ module providing programmatic access to OpenAPI specifications
- get_task_app_contract() function returns contract as string
- TASK_APP_CONTRACT_PATH constant for file system access
- CLI commands: synth contracts show and synth contracts path
Validation Script: Comprehensive validation script comparing OpenAPI schemas against Pydantic models
- Detects missing fields, required/optional mismatches, and type inconsistencies
- Clear error messages with actionable fixes
- Integrated into CI pipeline

🎯 Use Cases

Existing Codebases: Integrate Synth prompt optimization without rewriting your codebase in Python
Performance-Critical Tasks: Use compiled languages (Rust, Go, Zig) for CPU-intensive evaluation logic
Edge Deployment: Deploy TypeScript Task Apps to Cloudflare Workers for global edge optimization
Minimal Dependencies: Build single-binary Task Apps with zero runtime dependencies (Go, Zig)
Cross-Platform: Trivially cross-compile Task Apps for any target platform

🔗 Resources

Examples: examples/polyglot/ - Complete implementations in Rust, Go, TypeScript, and Zig
OpenAPI Contract: synth_ai/contracts/task_app.yaml
Documentation: Polyglot Task Apps Guide

2025-01-XX – Judge Score Tracking for Prompt Optimization

🚀 New Features

Hosted Judge Integration for GEPA & MIPRO

Judge Score Tracking: GEPA and MIPRO optimizers now support hosted judge evaluation alongside environment rewards
- Automatic Score Extraction: Judge scores are automatically extracted from trajectory traces and stored in optimization archives
- Score Fusion: Combine environment rewards with judge scores using configurable fusion strategies (weighted sum, multiply, or replace)
- Archive Integration: Judge scores are stored alongside accuracy metrics in the optimization archive for analysis
- Backward Compatible: Existing configs continue to work; judge scoring is opt-in via [judge] configuration section
Judge Configuration: Add [judge] section to prompt optimization configs to enable hosted judge evaluation
- Provider Support: OpenAI, Groq, and Anthropic judge providers
- Rubric-Based Scoring: Use Synth-hosted rubrics for consistent evaluation
- Reward Fusion: Configure how judge scores combine with environment rewards
- Cost Optimization: Support for outcome-only judging to minimize API costs

📚 Documentation

Judges Guide: New comprehensive guide at /prompt-optimization/judges explaining how judges work in prompt optimization
- How judges evaluate prompt performance
- Configuring judges for GEPA and MIPRO
- Reward fusion strategies
- Best practices and cost optimization
Algorithm Updates: Updated GEPA and MIPRO algorithm documentation to mention hosted judge support
- Judge score tracking in optimization archives
- Multi-objective optimization with judge rewards
- Integration with existing reward signals

🎯 Use Cases

Quality-Aware Optimization: Optimize prompts for both accuracy and quality/style using judge scores
Preference Alignment: Train prompts to align with specific behavioral preferences via rubric-based evaluation
Multi-Objective Search: Balance multiple objectives (accuracy, quality, efficiency) in Pareto optimization
Automated Feedback: Get rich evaluation signals without manual reward engineering

2025-11-18 – In-Process Task Apps for GEPA & MIPRO, Gemini 3 Support

🚀 New Features

In-Process Task Apps for Prompt Optimization

MIPRO In-Process Demo: Complete in-process MIPRO optimization with automatic task app and tunnel management
- Single-Script Execution: Everything runs in one Python script - no separate terminals or manual process management
- Banking77 Demo: Intent classification task with 77 banking categories using PolyAI/banking77 dataset
- Automatic Cleanup: Task app server and Cloudflare tunnel cleaned up automatically on exit
- Reduced Budget Mode: Demo runs with 3 iterations (reduced from 5) for faster testing
- Command: uvx synth-ai demo mipro scaffolds complete demo to demo_mipro/ directory
GEPA In-Process Demo: Complete in-process GEPA optimization with evolutionary prompt improvement
- Heart Disease Classification: Medical classification task using buio/heart-disease dataset
- Evolutionary Optimization: Population-based optimization with mutation, crossover, and Pareto selection
- Multi-Objective: Balances accuracy, token efficiency, and tool-calling behavior
- Example Script: examples/blog_posts/gepa/run_fully_in_process.py demonstrates complete workflow

Gemini 3 Model Support

Google Gemini 3 Models: Added support for Google’s latest Gemini 3 model family
- Supported Models: gemini-3-pro, gemini-3-flash, gemini-3-flash-lite for policy and meta-models
- Provider Integration: Full support for provider = "google" in prompt optimization configurations
- Cost Tracking: Automatic pricing calculation and token tracking for Gemini 3 models
- Example Configs: Gemini 3 configurations available in MIPRO and GEPA example configs

Cloudflare Tunnel Enhancements

Improved Tunnel Reliability: Enhanced Cloudflare tunnel management for prompt optimization workflows
- Automatic Tunnel Discovery: InProcessTaskApp automatically creates and manages tunnels
- Health Check Integration: Waits for task app health before opening tunnel
- Process Management: Background thread management with graceful shutdown
- Cross-Platform Support: Works on macOS, Linux, and Windows with automatic cloudflared installation

📚 Documentation

MIPRO Quickstart: Complete quickstart guide at /quickstart/prompt-optimization-mipro
- Setup instructions with synth-ai setup and synth-ai demo mipro
- Configuration details for MIPRO parameters (TPE, demo selection, meta-updates)
- Troubleshooting section for common issues
GEPA In-Process Guide: Explanatory guide at /quickstart/gepa-in-process
- How GEPA works with evolutionary algorithms
- In-process architecture explanation
- Comparison table: GEPA vs MIPRO

🎯 Use Cases

Rapid Prototyping: Test prompt optimization without infrastructure setup
Local Development: Develop and debug optimization workflows entirely locally
Educational Demos: Show prompt optimization in action without deployment complexity
CI/CD Integration: Run prompt optimization in automated pipelines with full cleanup

2025-11-17 – SDK Release 0.2.25.dev1

📦 Package Updates

Version Bump: Updated synth-ai package to 0.2.25.dev1
Modal Deployment: Updated default SYNTH_AI_VERSION in Modal deployments to 0.2.25.dev1

2025-11-17 – Vendored Prompt Learning: Production-Ready Examples

🚀 New Features

Production Prompt Optimization Examples (`vendored_prompt_learning`)

Complete Pipeline Examples: New production-ready examples demonstrating prompt optimization on the fly in production environments
- GEPA Pipeline: run_gepa_example.py - Complete GEPA optimization workflow from baseline evaluation to final prompt deployment
- MIPRO Pipeline: run_mipro_example.py - Complete MIPRO optimization workflow with programmatic polling and progress tracking
- In-Process Task Apps: Automatic task app management with Cloudflare tunnel support for production deployments
- Self-Contained Scripts: Everything in one script - no external dependencies or manual setup required

Production Integration Features

In-Process Task App Management: InProcessTaskApp utility automatically manages FastAPI servers and Cloudflare tunnels
- Automatic Tunnel Creation: Opens Cloudflare tunnels automatically for production use
- Background Server Management: Runs task apps in background threads with graceful shutdown
- Port Conflict Resolution: Automatically finds available ports if requested port is busy
Programmatic Polling: Built-in job status polling with progress callbacks and timeout handling
Prompt Retrieval: Easy extraction of optimized prompts from completed jobs
Baseline & Final Evaluation: Complete evaluation pipeline comparing initial vs optimized prompts

📚 Documentation

Production Guide: Comprehensive guide at /blog/prompt-optimization-benchmarks explaining production use cases
Examples Directory: Complete examples in examples/blog_posts/vendored_prompt_learning/
- Full pipeline scripts for GEPA and MIPRO
- Configuration files for all benchmarks (Banking77, HeartDisease, HotpotQA, Pupa)
- In-process scripts with minimal budgets for quick testing
README: Detailed README with setup instructions, architecture overview, and customization options

🎯 Use Cases

A/B Testing: Automatically find better prompts for your use case without manual intervention
Performance Tuning: Continuously improve prompt performance as your data changes
Multi-Tenant Optimization: Optimize prompts per customer or use case
Rapid Iteration: Test and deploy better prompts faster than manual tuning

🔧 Technical Improvements

Unified Benchmark Suite: Consolidated all GEPA and MIPRO examples from blog_posts/gepa/ and blog_posts/mipro/ into a single directory
Path Fixes: Fixed environment variable loading and task app path resolution for reliable execution
Budget Controls: Configurable rollout budgets with minimal budget modes for quick testing (~1 minute)
Error Handling: Comprehensive error handling with clear messages and fallback behavior

2025-11-14 – Artifacts CLI

🚀 New Features

Artifacts Management (`synth-ai artifacts`)

Unified Artifact Management: New synth-ai artifacts command suite for managing and inspecting all Synth AI artifacts
- List Artifacts: artifacts list displays all fine-tuned models, RL models, and optimized prompts with filtering options
- Show Details: artifacts show provides detailed information about specific artifacts, with intelligent prompt extraction
- Export Models: artifacts export exports models to HuggingFace Hub (private or public)
- Download Prompts: artifacts download downloads optimized prompts in JSON, YAML, or text formats
- Smart ID Parsing: Centralized parsing logic handles all artifact ID formats (ft:, rl:, peft:, pl_)
- Best Prompt Display: Default view shows job summary + best optimized prompt with syntax highlighting
- Verbose Mode: --verbose flag shows full metadata and snapshot details for prompts
- JSON Export: --format json enables easy scripting and data export

Prompt Extraction Intelligence

Multi-Structure Support: Automatically extracts prompt messages from various snapshot structures:
- Direct messages arrays
- object → messages arrays
- object → text_replacements arrays (GEPA structure)
- initial_prompt → data → messages structures
Syntax Highlighting: Rich formatting with role-based colors (system/user/assistant)
Default View: Shows only the most important information (summary + best prompt)
Verbose View: Full metadata and snapshot details when needed

🔧 Technical Improvements

Centralized Parsing: All artifact ID parsing logic consolidated in synth_ai.cli.commands.artifacts.parsing
- Comprehensive validation and error handling
- Support for special characters, unicode, whitespace handling
- Type detection and validation utilities
Backend Integration: New backend endpoints in /api/artifacts:
- GET /api/artifacts - List all artifacts with filtering
- GET /api/artifacts/models/{model_id} - Get model details
- GET /api/artifacts/prompts/{job_id} - Get prompt details with metadata extraction
- POST /api/learning/exports/hf - Export to HuggingFace
Error Handling: Clear error messages for invalid IDs, missing artifacts, authentication failures
Test Coverage: Comprehensive test suite with 59+ tests covering:
- All parsing edge cases (whitespace, unicode, special chars, empty values, etc.)
- All detection and validation functions
- Wasabi key resolution
- CLI command behavior with mocked clients
- Integration tests for real backend calls

📚 Documentation

CLI Documentation: Complete CLI reference page at /cli/artifacts
README: Comprehensive README in synth-ai/synth_ai/cli/commands/artifacts/README.md
Code Examples: Usage examples for all subcommands and common workflows

🎯 Use Cases

Artifact Discovery: Quickly find and inspect all your trained models and optimized prompts
Prompt Extraction: Easily extract and use optimized prompts from completed jobs
Model Sharing: Export models to HuggingFace for sharing or deployment
Scripting: JSON output enables automation and integration with other tools
Debugging: Verbose mode helps troubleshoot job issues and inspect metadata

2025-11-14 – In-Process Task App Utility

🚀 New Features

In-Process Task App (`InProcessTaskApp`)

Automatic Server & Tunnel Management: New InProcessTaskApp utility enables running task apps entirely within your Python script
- Background Server: Automatically starts uvicorn server in background thread
- Automatic Tunnel Creation: Opens Cloudflare quick tunnel automatically and provides tunnel URL
- Port Conflict Handling: Automatically finds available ports if requested port is busy (auto_find_port=True by default)
- Multiple Input Methods: Supports FastAPI app instance, TaskAppConfig object, config factory function, or task app file path
- Signal Handling: Graceful shutdown on SIGINT/SIGTERM prevents orphaned processes
- Observability: Structured logging and optional on_start/on_stop callbacks
- Input Validation: Comprehensive validation with clear error messages for port range, host security, tunnel mode, and file existence

Public Cloudflare Utilities

Public API Functions: Made start_uvicorn_background() and wait_for_health_check() public in synth_ai.cloudflare
- Can be used independently for custom server management
- Previously private functions now available for advanced use cases

🔧 Technical Improvements

Port Management: Smart port conflict resolution with automatic port finding
- _find_available_port() helper searches for available ports starting from requested port
- _is_port_available() helper checks port availability
- _kill_process_on_port() helper attempts to free occupied ports (best-effort, cross-platform)
Error Handling: Comprehensive error handling with actionable error messages
Test Coverage: Full test suite with unit tests (383 lines) and integration tests (145 lines)
Security: Host validation restricts to localhost addresses for security

📚 Documentation

SDK Documentation: Complete SDK reference page at /sdk/in-process-task-app
README: Comprehensive README in synth-ai/IN_PROCESS_TASK_APP_README.md with usage examples
Code Examples: Multiple usage examples including GEPA integration, callbacks, and port handling

🎯 Use Cases

Local Development: Run prompt optimization without separate terminal processes
Demos & Scripts: Simplify demo scripts and automation workflows
CI/CD: Integrate task apps into automated testing and deployment pipelines
Resource Management: Automatic cleanup prevents resource leaks

2025-11-14 – Experiment Queue System

🚀 New Features

Experiment Queue Management (`synth-ai queue`)

Queue Worker Management: New synth-ai queue command suite for managing Celery workers that process experiment jobs
- Start Worker: synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
- Stop Worker: synth-ai queue stop terminates all running experiment queue workers
- Status Check: synth-ai queue status (or synth-ai queue) displays current worker status and configuration
- Single Worker Enforcement: Automatically kills existing workers before starting new ones to ensure only one instance runs
- Database Path Validation: Enforces consistent database path usage across all workers
- Background Mode: Workers run in background by default with logs written to logs/experiment_queue_worker.log
- Beat Scheduler: Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs

Experiment Submission (`synth-ai experiment`)

Submit Experiments: synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
- JSON Payloads: Submit from files or inline JSON strings
- Job Validation: Validates config files and environment files before submission
- Automatic Dispatch: Initial jobs are dispatched immediately upon submission
- Parallelism Control: Configure how many jobs run concurrently per experiment
Experiment Management: synth-ai experiment list and synth-ai experiments provide dashboard views
- Status Filtering: Filter experiments by status (QUEUED, RUNNING, COMPLETED, FAILED, CANCELED)
- Watch Mode: Continuous refresh with --watch flag for real-time monitoring
- JSON Output: Machine-readable JSON output for automation and scripting

Redis-Based Queue Backend

Redis Broker: Migrated from SQLite broker to Redis for improved reliability and performance
- Default Configuration: Uses redis://localhost:6379/0 for broker and redis://localhost:6379/1 for result backend
- Environment Variables: Configurable via EXPERIMENT_QUEUE_BROKER_URL and EXPERIMENT_QUEUE_RESULT_BACKEND_URL
- No Locking Issues: Eliminates SQLite locking conflicts that prevented job dispatch
- Concurrent Access: Supports multiple workers and concurrent job processing

🔧 Technical Improvements

Database Architecture: SQLite database for experiment metadata, Redis for Celery broker/backend
- WAL Mode: Application database uses Write-Ahead Logging for concurrent reads/writes
- Path Enforcement: Single database path requirement prevents configuration conflicts
- Automatic Initialization: Database schema created automatically on first use
Periodic Task System: Celery Beat scheduler automatically dispatches queued jobs every 5 seconds
Error Handling: Comprehensive error reporting with detailed diagnostics for authentication failures, health check failures, and silent failures
Result Parsing: Enhanced result collection supports both JSON and text-based outputs (GEPA/MIPRO)
Test Suite: Fast, comprehensive test suite (42 tests, ~2 seconds) with proper isolation and timeout handling

📚 Documentation

CLI Documentation: Complete documentation for synth-ai queue and synth-ai experiment commands
README: Comprehensive README in synth_ai/experiment_queue/ with usage examples and troubleshooting
Environment Variables: Documented required and optional environment variables for queue configuration

🎯 Use Cases

Local Development: Run prompt learning experiments locally without backend API
Batch Processing: Submit multiple experiments and let the queue process them automatically
Resource Management: Control parallelism to manage resource usage
Monitoring: Real-time status monitoring with watch mode and JSON output for dashboards

2025-11-14 – Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements

🚀 New Features

Task App Discovery and Health Checking (`synth-ai scan`)

New Command: Added synth-ai scan command to discover and health-check running task applications
- Multi-Method Discovery: Uses port scanning, service records, tunnel records, process inspection, backend API, and registry queries
- Health Checks: Performs HTTP health checks on /health endpoints and extracts metadata from /info endpoints
- Dual Output Formats: Supports both human-readable table format and machine-readable JSON format
- Service Records Integration: Automatically discovers apps deployed via synth-ai deploy --runtime local
- Tunnel Discovery: Discovers Cloudflare tunnels deployed via synth-ai deploy --runtime tunnel
- Process Scanning: Inspects running cloudflared processes to find tunnel URLs
- API Key Support: Supports authentication via X-API-Key header for protected apps

Backend Cloudflare Tunnel Infrastructure

Complete Tunnel Management API: Full backend infrastructure for managing Cloudflare tunnels
- REST API Endpoints: New endpoints for tunnel CRUD operations (POST /api/v1/tunnels, GET /api/v1/tunnels, GET /api/v1/tunnels/tunnel, DELETE /api/v1/tunnels/{id})
- Tunnel Manager Service: Lifecycle management with automatic tunnel replacement and Cloudflare Access credential support
- Database Models: Complete tunnel schema with support for access_client_id and access_client_secret for Cloudflare Access integration
- Database Migrations: Removed hostname unique constraint, added partial index for active tunnels
- Deployment Scripts: Production-ready tunnel deployment utilities (run_backend_tunnel.sh, start_tunnel.py)
- Comprehensive Testing: Full test suite for tunnel creation, management, and cleanup

Unified Artifacts API

Backend Artifacts Endpoints: New REST API for managing all Synth AI artifacts
- List Artifacts: GET /api/artifacts - Unified view of fine-tuned models, RL models, and optimized prompts with filtering
- Model Details: GET /api/artifacts/models/{model_id} - Get detailed model information
- Prompt Details: GET /api/artifacts/prompts/{job_id} - Get prompt details with intelligent metadata extraction
- Smart Prompt Extraction: Automatically extracts prompts from various snapshot structures (direct messages, GEPA text_replacements, nested structures)
- Backend Integration: Connected to existing learning jobs and RL registries for unified artifact management

Prompt Optimization Proposer Backends

Modular Proposer System: New pluggable proposer architecture for GEPA and MIPRO prompt optimization
- DSPy Proposer: DSPy-inspired instruction proposer (proposer_mode="dspy") - Standard DSPy-style mutation prompts for prompt generation
- GEPA-AI Proposer: GEPA-AI inspired proposer (proposer_mode="gepa-ai") - Alternative instruction generation approach with GEPA-AI flavored prompts
- Synth Proposer: Synth Research variant (proposer_mode="synth") - Synth-branded proposer currently mirroring DSPy behavior
- Built-in Proposers: Base proposer implementations for both GEPA and MIPRO algorithms
- Configuration: Per-proposer configuration with separate meta-model settings, temperature, and token limits
- Backward Compatibility: Auto-migration from deprecated “builtin” mode to “dspy” mode with deprecation warnings

OpenAI & Groq Provider Support

Enhanced Provider Integration: Comprehensive support for OpenAI and Groq models in prompt optimization
- OpenAI Models: Full support for GPT-4o, GPT-4.1, GPT-5 families with automatic pricing calculation
- Groq Models: Support for Llama, Mixtral, and Gemma models via Groq’s OpenAI-compatible API
- Provider URL Resolution: Centralized provider URL management (get_provider_url()) for consistent API endpoint resolution
- Pricing Integration: Per-token pricing tables for accurate cost tracking and budget enforcement
- Environment Overrides: Runtime pricing override via PL_TOKEN_RATES_JSON environment variable

Experiment Queue System (`synth-ai queue` & `synth-ai experiment`)

Queue Worker Management: New synth-ai queue command suite for managing Celery workers that process experiment jobs
- Start Worker: synth-ai queue start launches a background Celery worker with Beat scheduler for periodic job dispatch
- Stop Worker: synth-ai queue stop terminates all running experiment queue workers
- Status Check: synth-ai queue status (or synth-ai queue) displays current worker status and configuration
- Single Worker Enforcement: Automatically kills existing workers before starting new ones to ensure only one instance runs
- Background Mode: Workers run in background by default with logs written to logs/experiment_queue_worker.log
- Beat Scheduler: Integrated Celery Beat runs periodic queue checks every 5 seconds to dispatch queued jobs
Experiment Submission: synth-ai experiment submit accepts JSON payloads to create new experiments with multiple jobs
- JSON Payloads: Submit from files or inline JSON strings
- Job Validation: Validates config files and environment files before submission
- Automatic Dispatch: Initial jobs are dispatched immediately upon submission
- Parallelism Control: Configure how many jobs run concurrently per experiment
Experiment Management: synth-ai experiment list provides dashboard views with status filtering and watch mode
Redis-Based Queue Backend: Migrated from SQLite broker to Redis for improved reliability and performance
- No Locking Issues: Eliminates SQLite locking conflicts that prevented job dispatch
- Concurrent Access: Supports multiple workers and concurrent job processing

Deploy Command Enhancements

Non-Blocking by Default: Deploy command now runs in background mode by default to prevent indefinite stalls
- Local Deployments: Uses nohup to run uvicorn servers in background
- Tunnel Deployments: Returns immediately after starting tunnel (non-blocking)
- Modal Deployments: Starts process and returns immediately (non-blocking)
Optional Blocking Mode: Added --wait flag for interactive use when blocking behavior is desired
API Key Validation: Enhanced validation to ensure SYNTH_API_KEY and ENVIRONMENT_API_KEY are present before deployment
- Checks both environment variables and provided .env file
- Provides detailed error messages showing where keys were found or missing
- Prevents deployments without required credentials

🔧 Technical Improvements

Service Record Management: Local and tunnel deployments automatically record to ~/.synth-ai/services.json
- Records include PID, URL, port, app_id, and deployment metadata
- Automatic cleanup of stale records (dead processes)
Health Check Logic: Robust health check implementation with timeout handling and error detection
Performance: Uses asyncio for parallel health checks and port scanning
Comprehensive Testing: 33 tests total (19 unit, 11 integration, 3 end-to-end) with proper timeout handling

📚 Documentation

CLI Documentation: Added comprehensive scan command documentation
Tunnel Documentation: Complete guides for backend tunnel setup, deployment, and troubleshooting
Artifacts API: Backend API documentation for artifact management endpoints
Proposer System: Documentation for DSPy, GEPA-AI, and Synth proposer modes
README: Created detailed READMEs in scan module and tunnel infrastructure with usage examples
Code Documentation: Added comprehensive docstrings across all scan module files

2025-11-11 – Session-Based Pricing & Budget Enforcement

🚀 New Features

Session-Based Pricing: Comprehensive session-based usage tracking and budget enforcement for cost-incurring API requests.
- Single Active Session: Only one active session per organization/user at a time, automatically managed
- Automatic Session Attachment: All cost-incurring requests automatically attach to the active session unless explicitly opted out
- Budget Enforcement: Requests are rejected with 429 when session limits are exceeded (tokens, cost, GPU hours)
- Opt-Out Support: Set X-No-Session: true header to skip session tracking for specific requests
- Session Lifecycle: Sessions have explicit beginning, active period, and end states
- TOML Configuration: CLI commands (codex, opencode) support TOML config files with session limits (default: $20 cost limit)
- Session Status Command: New synth status session command displays session details, usage, and limits

📊 Session Management

Auto-Creation: If no X-Session-ID header is provided, system automatically finds or creates an active session
Explicit Session ID: Provide X-Session-ID header to use a specific session
Session Limits: Configure hard limits on tokens, cost (USD), GPU hours, and API calls per session
Usage Tracking: All usage (tokens, cost, GPU hours) automatically recorded to the active session
Limit Violations: Sessions automatically marked as limit_exceeded when limits are breached
CLI Integration: uvx synth-ai codex and uvx synth-ai opencode automatically create sessions with limits from TOML config

🔒 Budget Enforcement

Pre-Request Checks: Budget limits checked before processing requests to prevent unnecessary resource usage
Protected Routes: All cost-incurring routes (/api/v1/responses, /api/synth-research/chat/completions, etc.) enforce session budgets
Error Responses: Clear error messages with limit details when budgets are exceeded
Session Required: Requests require a session unless X-No-Session: true header is set

📚 Documentation

Session Pricing Spec: Complete specification document (session_pricing.spec.md) detailing session management behavior
API Documentation: Updated API endpoints with session management examples
SDK Integration: SDK supports session creation, querying, and usage tracking with full type safety

🔧 Technical Improvements

Database Schema: New tables for session tracking (session_usage_sessions, session_usage_limits, session_usage_records, session_usage_limit_violations)
Service Layer: Comprehensive service layer for session CRUD, limit checking, and usage recording
Query Builder: Fluent query builders for both backend and SDK for session queries
Integration Tests: Full test coverage for session creation, budget checking, and usage tracking
Type Safety: Complete type annotations with all type checks passing

2025-11-11 – Cloudflare Tunnel Support for Task App Deployment

🚀 New Features

Cloudflare Tunnel Deployment: Added support for deploying task apps via Cloudflare Tunnel, enabling RL training and prompt optimization without deploying to Modal.
- Quick Tunnels: Free, ephemeral tunnels (no account required) perfect for development and testing. Background mode by default, returns immediately after deployment.
- Managed Tunnels: Stable tunnels with custom subdomains (e.g., my-company.usesynth.ai) for production use. Requires SYNTH_API_KEY and backend provisioning (coming soon).
- Seamless Integration: Works with RL training and prompt optimization (GEPA/MIPRO) - simply deploy via tunnel and use the URL in your training configs.
- Clean Abstractions: Process management hidden from users - deploy and use, no helper scripts needed.

📚 Documentation

Deployment Guide: Updated CLI documentation with Cloudflare Tunnel deployment examples and use cases.
Example Workflow: Added examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh demonstrating GEPA prompt optimization with tunnel deployment.

🔧 Technical Improvements

Background Mode: Tunnel deployments run in background by default (non-blocking), with optional --keep-alive flag for blocking mode.
Health Checks: Automatic health check polling ensures task app is ready before tunnel opens.
Credential Management: Automatic writing of TASK_APP_URL and Cloudflare Access credentials to .env files.

2025-11-09 – First-Class Codex Support for Synth Models

🚀 New Features

Synth Model Support for Codex: Added first-class support for synth-small, synth-medium, and synth-experimental models in Codex CLI workflows.
- Responses API Integration: Full Responses API format support with proper conversion to/from Chat Completions format for seamless Codex integration.
- Tool Call Handling: Complete tool call processing with proper event sequencing, argument streaming, and output handling for Codex’s tool execution flow.
- Stop Tool: Added __internal_stop tool to signal completion and prevent infinite loops in Responses → Chat → Responses conversion flows.
- Stream Completion: Reliable stream completion signals with proper response.completed events and finish_reason handling.

🔧 Technical Improvements

Responses API Bridge: Implemented comprehensive Responses API → Chat Completions → Responses API conversion layer for Codex compatibility.
Event Translation: Full SSE event translation between Responses and Chat formats, including tool calls, function outputs, and completion signals.
Comprehensive Testing: Added 32 integration tests covering all critical Responses API behavior, tool call processing, and edge cases.

2025-11-07 – Multi-Stage Optimizers & Expanded Model Support

🚀 New Features

Multi-Stage MIPRO & GEPA: Both prompt optimization algorithms now support multi-stage pipeline optimization for complex workflows with multiple processing stages.
- MIPRO Multi-Stage: Generates per-stage instruction proposals with automatic stage detection via LCS (Longest Common Subsequence) matching. Each stage gets stage-specific meta-prompts including pipeline overview, stage role, and baseline performance. Supports per-module configuration with max_instruction_slots and max_demo_slots for fine-grained control.
- GEPA Multi-Stage: Uses module-aware evolution where each pipeline module gets its own gene. Mutations target specific modules, uniform crossover combines parent genes per module, and aggregated scoring sums module lengths for Pareto optimization. Supports per-module max_instruction_slots, max_tokens, and allowed_tools configuration.
- Configuration: Both algorithms support pipeline_modules metadata in initial prompts and module-specific settings in their respective config sections (prompt_learning.gepa.modules and prompt_learning.mipro.modules).
Gemini Model Support: Added comprehensive support for Google Gemini models as policy models for both GEPA and MIPRO algorithms.
- Supported Models: gemini-2.5-pro (≤200k tokens), gemini-2.5-pro-gt200k (>200k tokens), gemini-2.5-flash, and gemini-2.5-flash-lite.
- Provider Integration: Full SDK validation and backend support for provider = "google" with automatic pricing calculation and token tracking.
- Example Configs: Added example configurations demonstrating Gemini usage, including banking77_pipeline_mipro_gemini_flash_lite_local.toml for cost-effective multi-stage optimization.
OpenAI Model Support: Expanded OpenAI model support for prompt optimization with comprehensive coverage of latest models.
- Supported Models: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, and gpt-5-nano.
- Model Validation: SDK-side validation with clear error messages for unsupported models. Explicit rejection of gpt-5-pro due to high cost ( $15/$ 120 per 1M tokens).
- Provider Prefix Support: Models can be specified with or without provider prefix (e.g., "gpt-4o" or "openai/gpt-4o").
SDK Validation Enhancements: Improved config validation with comprehensive error checking before sending to backend.
- Multi-Stage Validation: Validates that pipeline_modules match module configs, checks for missing or extra modules, and ensures proper module ID matching.
- Model Validation: Provider-aware model validation with detailed error messages listing supported models for each provider.
- Nano Model Restrictions: Clear validation that nano models (gpt-4.1-nano, gpt-5-nano) are allowed for policy models but rejected for mutation/meta models (too small for generation tasks).

🔧 Technical Improvements

Config Parsing: Enhanced TOML parsing for multi-stage configurations with support for nested module and stage definitions.
Integration Tests: Added comprehensive integration tests for multi-stage GEPA and MIPRO workflows, including Gemini model validation tests.
Error Messages: Improved validation error messages with actionable suggestions and links to example configurations.

📚 Documentation

Multi-Stage Pipeline Guide: Updated documentation with examples and configuration details for optimizing multi-stage pipelines with both GEPA and MIPRO algorithms.
Model Support Reference: Complete documentation of supported models for each provider (OpenAI, Groq, Google) with usage examples.
Example Configurations: Added example configs demonstrating multi-stage optimization with different model providers, including multi_stage_gepa_example.toml and banking77_pipeline_mipro_gemini_flash_lite_local.toml.

2025-11-04 – GEPA: Genetic Evolution for Prompt Optimization

🚀 New Features

GEPA Algorithm: Genetic Evolution for Prompt Optimization (GEPA) is now available for prompt optimization jobs. GEPA uses evolutionary algorithms (mutation, crossover, selection) to optimize prompts across multiple generations, achieving significant accuracy improvements on classification and reasoning tasks.
Prompt ID-Based URLs: Prompt transformations now use versioned URLs (/v1/{prompt_version_id}/chat/completions) for better traceability, concurrency, and debugging. Each transformation gets a unique version ID based on content hashing.
Multi-Objective Optimization: GEPA maintains a Pareto front balancing accuracy, token count, and task-specific metrics (e.g., tool call rate).
Validation Scoring: Job results now distinguish between prompt_best_train_score and prompt_best_validation_score for clearer evaluation metrics.
Integration Testing: Added comprehensive integration tests for GEPA training workflows with Banking77 task app.

📚 Documentation

GEPA Guide: Complete documentation with quick start, configuration examples, and troubleshooting for Banking77, HotpotQA, IFBench, HoVer, and PUPA tasks.
Integration Examples: Step-by-step guides for deploying task apps and running GEPA optimization locally and on Modal.

2025-10-28 – Terminal Training Logs

🚀 New Features

Full terminal streaming logs: Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

🚀 New Features

Hosted Synth judges (configurable): Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
Rubric-aware filtering: SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
Qwen-VL support across SFT & RL: Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
Instruct-model RL guidance: Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

🚀 New Features

Qwen Coder models supported: Qwen Coder variants are now available across SFT and inference workflows.
SDK migrated to Turso for concurrency: Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
More training topologies on H200s: Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
Full LoRA support for Policy Gradient: LoRA integrated end-to-end into Policy Gradient training flows.
Pipelined RL async rollouts: Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.

2025-10-09 – LoRA, MoE & Large Model Support

🚀 New Features

Expanded Qwen catalog: Simple Training now ships SFT and inference presets for every Qwen release outside the existing qwen3-{0.6B–32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
Large-model inference & training topologies: Added 2×, 4×, and 8× layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
Turnkey rollout: API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
LoRA-first SFT: Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.

2025-09-24 – Platform Updates

🚀 New Features

Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads
Faster Inference: Optimized inference pipeline with improved throughput and reduced latency across all model sizes
GSPO Support: Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training

2025-09-17 – Online RL (customer‑visible features)

Organization‑scoped environment credentials
- Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
First‑party Task App integration
- Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
Single‑node, multi‑GPU Online RL
- Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
- Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
Production run flow
- Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

0.2.2.dev2 — Aug 8, 2025

Fine-tuning (SFT) endpoints available and documented end-to-end
Interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B
Live polling output during training with real-time status updates
CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher

0.2.2.dev1 — Aug 7, 2025

New backend balance APIs and CLI for account visibility
CLI utilities: balance, traces, and man commands
Traces inventory view with per-DB counts and storage footprint
Standardized one-off usage: uvx synth-ai <command> (removed interactive watch)
Improved .env loading and API key resolution

0.2.2.dev0 — Jul 30, 2025

Environment Registration API for custom environments
Turso/sqld daemon support with local-first replicas
Environment Service Daemon via uvx synth-ai serve

0.2.1.dev1 — Jul 29, 2025

Initial development release

Feb 3, 2025

Cuvier Error Search (deprecated)

Jan 2025

Langsmith integration for Enterprise partners
Python SDK v0.3 (simplified API, Anthropic support)

Get Started

Supported Models

Pricing

​2025-11-22 – Language-Agnostic Prompt Optimization via OpenAPI Contracts

​🚀 New Features

​Polyglot Task App Support

​Complete Polyglot Examples

​OpenAPI ↔ Pydantic Validation

​📚 Documentation

​🔧 Technical Improvements

​🎯 Use Cases

​🔗 Resources

​2025-01-XX – Judge Score Tracking for Prompt Optimization

​🚀 New Features

​Hosted Judge Integration for GEPA & MIPRO

​📚 Documentation

​🎯 Use Cases

​2025-11-18 – In-Process Task Apps for GEPA & MIPRO, Gemini 3 Support

​🚀 New Features

​In-Process Task Apps for Prompt Optimization

​Gemini 3 Model Support

​Cloudflare Tunnel Enhancements

​📚 Documentation

​🎯 Use Cases

​2025-11-17 – SDK Release 0.2.25.dev1

​📦 Package Updates

​2025-11-17 – Vendored Prompt Learning: Production-Ready Examples

​🚀 New Features

​Production Prompt Optimization Examples (vendored_prompt_learning)

​Production Integration Features

​📚 Documentation

​🎯 Use Cases

​🔧 Technical Improvements

​2025-11-14 – Artifacts CLI

​🚀 New Features

​Artifacts Management (synth-ai artifacts)

​Prompt Extraction Intelligence

​🔧 Technical Improvements

​📚 Documentation

​🎯 Use Cases

​2025-11-14 – In-Process Task App Utility

​🚀 New Features

​In-Process Task App (InProcessTaskApp)

​Public Cloudflare Utilities

​🔧 Technical Improvements

​📚 Documentation

​🎯 Use Cases

​2025-11-14 – Experiment Queue System

​🚀 New Features

​Experiment Queue Management (synth-ai queue)

​Experiment Submission (synth-ai experiment)

​Redis-Based Queue Backend

​🔧 Technical Improvements

​📚 Documentation

​🎯 Use Cases

​2025-11-14 – Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements

​🚀 New Features

​Task App Discovery and Health Checking (synth-ai scan)

​Backend Cloudflare Tunnel Infrastructure

​Unified Artifacts API

​Prompt Optimization Proposer Backends

​OpenAI & Groq Provider Support

​Experiment Queue System (synth-ai queue & synth-ai experiment)

​Deploy Command Enhancements

​🔧 Technical Improvements

​📚 Documentation

​2025-11-11 – Session-Based Pricing & Budget Enforcement

​🚀 New Features

​📊 Session Management

​🔒 Budget Enforcement

​📚 Documentation

​🔧 Technical Improvements

​2025-11-11 – Cloudflare Tunnel Support for Task App Deployment

​🚀 New Features

​📚 Documentation

​🔧 Technical Improvements

​2025-11-09 – First-Class Codex Support for Synth Models

​🚀 New Features

​🔧 Technical Improvements

​2025-11-07 – Multi-Stage Optimizers & Expanded Model Support

2025-11-22 – Language-Agnostic Prompt Optimization via OpenAPI Contracts

🚀 New Features

Polyglot Task App Support

Complete Polyglot Examples

OpenAPI ↔ Pydantic Validation

📚 Documentation

🔧 Technical Improvements

🎯 Use Cases

🔗 Resources

2025-01-XX – Judge Score Tracking for Prompt Optimization

🚀 New Features

Hosted Judge Integration for GEPA & MIPRO

📚 Documentation

🎯 Use Cases

2025-11-18 – In-Process Task Apps for GEPA & MIPRO, Gemini 3 Support

🚀 New Features

In-Process Task Apps for Prompt Optimization

Gemini 3 Model Support

Cloudflare Tunnel Enhancements

📚 Documentation

🎯 Use Cases

2025-11-17 – SDK Release 0.2.25.dev1

📦 Package Updates

2025-11-17 – Vendored Prompt Learning: Production-Ready Examples

🚀 New Features

Production Prompt Optimization Examples (`vendored_prompt_learning`)

Production Integration Features

📚 Documentation

🎯 Use Cases

🔧 Technical Improvements

2025-11-14 – Artifacts CLI

🚀 New Features

Artifacts Management (`synth-ai artifacts`)

Prompt Extraction Intelligence

🔧 Technical Improvements

📚 Documentation

🎯 Use Cases

2025-11-14 – In-Process Task App Utility

🚀 New Features

In-Process Task App (`InProcessTaskApp`)

Public Cloudflare Utilities

🔧 Technical Improvements

📚 Documentation

🎯 Use Cases

2025-11-14 – Experiment Queue System

🚀 New Features

Experiment Queue Management (`synth-ai queue`)

Experiment Submission (`synth-ai experiment`)

Redis-Based Queue Backend

🔧 Technical Improvements

📚 Documentation

🎯 Use Cases

2025-11-14 – Task App Discovery, Backend Infrastructure & Prompt Optimization Enhancements

🚀 New Features

Task App Discovery and Health Checking (`synth-ai scan`)

Backend Cloudflare Tunnel Infrastructure

Unified Artifacts API

Prompt Optimization Proposer Backends

OpenAI & Groq Provider Support

Experiment Queue System (`synth-ai queue` & `synth-ai experiment`)

Deploy Command Enhancements

🔧 Technical Improvements

📚 Documentation

2025-11-11 – Session-Based Pricing & Budget Enforcement

🚀 New Features

📊 Session Management

🔒 Budget Enforcement

📚 Documentation

🔧 Technical Improvements

2025-11-11 – Cloudflare Tunnel Support for Task App Deployment

🚀 New Features

📚 Documentation

🔧 Technical Improvements

2025-11-09 – First-Class Codex Support for Synth Models

🚀 New Features

🔧 Technical Improvements

2025-11-07 – Multi-Stage Optimizers & Expanded Model Support

🚀 New Features

🔧 Technical Improvements

📚 Documentation