2025-09-17 – Online RL (customer‑visible features)

  • Organization‑scoped environment credentials
    • Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
  • First‑party Task App integration
    • Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
  • Single‑node, multi‑GPU Online RL
    • Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
    • Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
  • Production run flow
    • Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

title: ‘Roadmap & Changelog’ description: ‘Track our progress and upcoming features’

Up Next

TBA

Changelog

Online RL — Sep 17, 2025

  • Organization‑scoped environment credentials
    • Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
  • First‑party Task App integration
    • Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
  • Single‑node, multi‑GPU Online RL
    • Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). Multi‑node training is in dev; reach out if interested.
    • Supports reference model (for KL) stacked on inference or on a dedicated GPU, and configurable tensor parallelism for inference.
  • Production run flow
    • Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

0.2.2.dev2 — Aug 8, 2025

Highlights

  • Fine-tuning (SFT) endpoints are now available and documented end-to-end (files → jobs → status)
  • Added interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B (Crafter)
  • Demo script streams live polling output during training (status updates visible while running)

CLI & Demos

  • uvx synth-ai demo — lists available demos and lets you run them interactively:
  • examples/finetuning/synth_qwen/run_demo.sh — rollouts → trace filtering → SFT kickoff, with live polling
    • examples/evals/run_demo.sh — quick eval rollouts and trace filtering for dataset prep
  • Improved demo UX: training status lines (e.g., ⏳ poll N/20 – status = running) now stream live in the terminal

Documentation

  • Examples → Rejection Finetuning: generate → filter → finetune → run
  • CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher
  • Tracing guide and filtering guide for SFT JSONL generation

0.2.2.dev1 — Aug 7, 2025

Highlights

  • New backend balance APIs and CLI for quick account visibility (USD balance + token/GPU spend windows)
  • New CLI utilities and manual: compact, one-off commands with uvx synth-ai <cmd> and man
  • Traces inventory view showing per-DB and per-system counts, plus on-disk size (GB)
  • Inference and SFT API routes consolidated and documented for local and Modal deployments

CLI

  • Added balance: prints minimal balance in USD and a compact spend table for the last 24h and 7d
    • Flags: --base-url, --api-key, --usage; sources .env automatically; guards against Modal URLs for account endpoints
  • Added traces: lists local trace DBs under ./synth_ai.db/dbs, shows traces, experiments, last activity, and size (GB), plus aggregated per-system counts
  • Added man: human-friendly command reference with options, env vars, and examples
  • Standardized one-off usage: uvx synth-ai <command> (removed legacy interactive watch)
  • Improved .env loading and API key resolution (SYNTH_BACKEND_API_KEYSYNTH_API_KEYDEFAULT_DEV_API_KEY)
  • Existing commands remain available: experiments, experiment <id>, usage [--model], status, calc, and env (list/register/unregister)

Demo

  • Local end-to-end demo: start backend (uv run uvicorn app.routes.main:app --reload --port 8000), set SYNTH_BACKEND_API_KEY, then:
    • uvx . balance → shows USD balance + 24h/7d spend
    • uvx . traces → inventories DBs and per-system counts with storage footprint
    • uvx . experiments and uvx . experiment <id> → explore local trace data

Breaking Changes

  • Removed watch (interactive TUI) in favor of one-off CLI commands

Notes

  • Publish a new package release to enable uvx synth-ai man and other commands without . prefix.

0.2.2.dev0 — Jul 30, 2025

What’s New

  • Environment Registration API: register custom environments dynamically via REST API, CLI, or entry points
  • Turso/sqld daemon support with local-first replicas via uvx synth-ai serve
  • Environment Service Daemon: uvx synth-ai serve starts both the DB daemon (port 8080) and environment service API (port 8901)

Breaking Changes

  • [List any breaking changes]

Bug Fixes

  • [List bug fixes]

Documentation

  • [List documentation updates]

0.2.1.dev1 — Jul 29, 2025

What’s New

  • [Add your changes here]

Breaking Changes

  • [List any breaking changes]

Bug Fixes

  • [List bug fixes]

Documentation

  • [List documentation updates]

Feb 3, 2025

  • Cuvier Error Search (deprecated)

Jan 2025

  • Langsmith integration for Enterprise partners
  • Python SDK v0.3 (simplified API, Anthropic support)