TL;DR
- Terminal Training Logs: Full real-time streaming logs for SFT and RL training
- Hosted Judges: Configurable Synth judges with per-job overrides
- Qwen-VL Support: Vision models now supported across SFT & RL
- Rubric-Aware Filtering: SFT filtering pipelines with structured rubric definitions
Terminal Training Logs
Bothuvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal.
Features
- Live Status Updates: See QUEUED, RUNNING, and other status updates in real-time
- Detailed Event Logs: Timestamps and sequence numbers for all events
- Full Metrics Logging: Training loss, learning rate, GPU utilization, KL divergence, rollout times
- Timeline Progression: Visual timeline showing progress throughout the entire training process
Rubrics, Hosted Judges & Qwen-VL RL
Hosted Synth Judges
Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides:- Rubric Selection: Choose from Synth-hosted rubrics for consistent evaluation
- Concurrency Caps: Control how many judge evaluations run concurrently
- Fallback Behavior: Configure fallback behavior when judges are unavailable
Rubric-Aware Filtering
SFT filtering pipelines accept structured rubric definitions:- Structured Scoring: Traces are scored according to your rubric criteria
- Automatic Trimming: Traces are trimmed before export based on rubric scores
- Custom Criteria: Define your own evaluation criteria for filtering
Qwen-VL Support
Qwen3-VL models can be fine-tuned and trained with RL:- Vision Collators: Built-in vision collators for image processing
- LoRA Projector Targeting: LoRA adapters target vision projectors
- Rollout Plumbing: Full support for vision models in RL rollouts
Instruct-Model RL Guidance
Added documentation and defaults for running RL on Qwen instruct SKUs:- Semaphore Tuning: Avoid premature episode completion
- Best Practices: Guidance on configuring RL for instruct models
Documentation
- RL Documentation: Updated RL guides with Qwen-VL examples
- Judge Configuration: Documentation for configuring hosted judges
- Rubric Guide: Guide for creating and using rubrics in filtering pipelines
Use Cases
- Real-Time Monitoring: Monitor training progress directly in terminal without switching contexts
- Quality Filtering: Use rubric-based filtering to improve training data quality
- Vision RL: Train RL models on vision tasks with Qwen-VL
- Consistent Evaluation: Use hosted judges for consistent evaluation across experiments