Changelog #0237 – Week of October 28, 2025

TL;DR

Terminal Training Logs: Full real-time streaming logs for SFT and RL training
Hosted Judges: Configurable Synth judges with per-job overrides
Qwen-VL Support: Vision models now supported across SFT & RL
Rubric-Aware Filtering: SFT filtering pipelines with structured rubric definitions

Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal.

Live Status Updates: See QUEUED, RUNNING, and other status updates in real-time
Detailed Event Logs: Timestamps and sequence numbers for all events
Full Metrics Logging: Training loss, learning rate, GPU utilization, KL divergence, rollout times
Timeline Progression: Visual timeline showing progress throughout the entire training process

Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides:

Rubric Selection: Choose from Synth-hosted rubrics for consistent evaluation
Concurrency Caps: Control how many judge evaluations run concurrently
Fallback Behavior: Configure fallback behavior when judges are unavailable

SFT filtering pipelines accept structured rubric definitions:

Qwen3-VL models can be fine-tuned and trained with RL:

Added documentation and defaults for running RL on Qwen instruct SKUs:

Real-Time Monitoring: Monitor training progress directly in terminal without switching contexts
Quality Filtering: Use rubric-based filtering to improve training data quality
Vision RL: Train RL models on vision tasks with Qwen-VL
Consistent Evaluation: Use hosted judges for consistent evaluation across experiments