Skip to main content

TL;DR

  • Terminal Training Logs: Full real-time streaming logs for SFT and RL training
  • Hosted Judges: Configurable Synth judges with per-job overrides
  • Qwen-VL Support: Vision models now supported across SFT & RL
  • Rubric-Aware Filtering: SFT filtering pipelines with structured rubric definitions

Terminal Training Logs

Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal.

Features

  • Live Status Updates: See QUEUED, RUNNING, and other status updates in real-time
  • Detailed Event Logs: Timestamps and sequence numbers for all events
  • Full Metrics Logging: Training loss, learning rate, GPU utilization, KL divergence, rollout times
  • Timeline Progression: Visual timeline showing progress throughout the entire training process

Rubrics, Hosted Judges & Qwen-VL RL

Hosted Synth Judges

Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides:
  • Rubric Selection: Choose from Synth-hosted rubrics for consistent evaluation
  • Concurrency Caps: Control how many judge evaluations run concurrently
  • Fallback Behavior: Configure fallback behavior when judges are unavailable

Rubric-Aware Filtering

SFT filtering pipelines accept structured rubric definitions:
  • Structured Scoring: Traces are scored according to your rubric criteria
  • Automatic Trimming: Traces are trimmed before export based on rubric scores
  • Custom Criteria: Define your own evaluation criteria for filtering

Qwen-VL Support

Qwen3-VL models can be fine-tuned and trained with RL:
  • Vision Collators: Built-in vision collators for image processing
  • LoRA Projector Targeting: LoRA adapters target vision projectors
  • Rollout Plumbing: Full support for vision models in RL rollouts

Instruct-Model RL Guidance

Added documentation and defaults for running RL on Qwen instruct SKUs:
  • Semaphore Tuning: Avoid premature episode completion
  • Best Practices: Guidance on configuring RL for instruct models

Documentation

  • RL Documentation: Updated RL guides with Qwen-VL examples
  • Judge Configuration: Documentation for configuring hosted judges
  • Rubric Guide: Guide for creating and using rubrics in filtering pipelines

Use Cases

  • Real-Time Monitoring: Monitor training progress directly in terminal without switching contexts
  • Quality Filtering: Use rubric-based filtering to improve training data quality
  • Vision RL: Train RL models on vision tasks with Qwen-VL
  • Consistent Evaluation: Use hosted judges for consistent evaluation across experiments