Skip to main content

TL;DR

  • Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts
  • B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs
  • Faster Inference: Optimized inference pipeline with improved throughput
  • GSPO Support: Integrated Group Sequence Policy Optimization algorithm

Rollout Viewer

Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking.
  • Real-Time Metrics: Live updates of training metrics during rollouts
  • Progress Tracking: Visual progress indicators for rollout phases
  • Enhanced Visualization: Improved charts and graphs for monitoring
  • Interactive Interface: User-friendly interface for exploring rollout data

B200 & H200 GPU Support

Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads.
  • Latest Hardware: Support for NVIDIA’s newest GPU architectures
  • Training Workloads: Full support for training on B200 and H200
  • Inference Workloads: Optimized inference on latest GPUs
  • Performance: Leverage latest GPU capabilities for better performance

Faster Inference

Optimized inference pipeline with improved throughput and reduced latency across all model sizes.
  • Throughput Improvement: Significant improvement in inference throughput
  • Reduced Latency: Lower latency for faster response times
  • Model Size Agnostic: Improvements across all model sizes
  • Optimized Pipeline: Better pipeline optimization for inference

GSPO Support

Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training.
  • Advanced RL: New algorithm for reinforcement learning training
  • Group Sequences: Optimizes policies for group sequences
  • Integration: Fully integrated into training workflows
  • Documentation: Complete documentation and examples

Use Cases

  • Real-Time Monitoring: Monitor training rollouts in real-time with enhanced viewer
  • Latest Hardware: Take advantage of B200 and H200 GPUs for training and inference
  • Faster Responses: Improved inference performance for production workloads
  • Advanced RL: Use GSPO for complex reinforcement learning scenarios