TL;DR
- Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts
- B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs
- Faster Inference: Optimized inference pipeline with improved throughput
- GSPO Support: Integrated Group Sequence Policy Optimization algorithm
Rollout Viewer
Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking.- Real-Time Metrics: Live updates of training metrics during rollouts
- Progress Tracking: Visual progress indicators for rollout phases
- Enhanced Visualization: Improved charts and graphs for monitoring
- Interactive Interface: User-friendly interface for exploring rollout data
B200 & H200 GPU Support
Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads.- Latest Hardware: Support for NVIDIA’s newest GPU architectures
- Training Workloads: Full support for training on B200 and H200
- Inference Workloads: Optimized inference on latest GPUs
- Performance: Leverage latest GPU capabilities for better performance
Faster Inference
Optimized inference pipeline with improved throughput and reduced latency across all model sizes.- Throughput Improvement: Significant improvement in inference throughput
- Reduced Latency: Lower latency for faster response times
- Model Size Agnostic: Improvements across all model sizes
- Optimized Pipeline: Better pipeline optimization for inference
GSPO Support
Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training.- Advanced RL: New algorithm for reinforcement learning training
- Group Sequences: Optimizes policies for group sequences
- Integration: Fully integrated into training workflows
- Documentation: Complete documentation and examples
Use Cases
- Real-Time Monitoring: Monitor training rollouts in real-time with enhanced viewer
- Latest Hardware: Take advantage of B200 and H200 GPUs for training and inference
- Faster Responses: Improved inference performance for production workloads
- Advanced RL: Use GSPO for complex reinforcement learning scenarios