TL;DR
- Qwen Coder Models: Coder variants now supported across SFT and inference workflows
- Turso Migration: SDK migrated to Turso for improved concurrency and throughput
- H200 Topologies: More training topologies on H200s with additional layouts
- LoRA Support: Full LoRA support for Policy Gradient training
- Pipelined RL: Improved throughput via asynchronous rollouts
Qwen Coder Models
Qwen Coder variants are now available across SFT and inference workflows.- Full Support: All Qwen Coder models supported for fine-tuning and inference
- Code Generation: Optimized for code generation and completion tasks
- Workflow Integration: Seamless integration with existing SFT and inference pipelines
SDK Migrated to Turso
Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.Benefits
- Concurrent Writes: Reliable concurrent writes without locking conflicts
- Higher Throughput: Improved performance in multi-process runs
- Local-First: Local-first database replication for development
- Scalability: Better scalability for high-throughput workloads
More Training Topologies on H200s
Added configurations for larger models with additional tensor/pipeline/data parallel layouts.- Flexible Layouts: More options for distributing models across GPUs
- Larger Models: Support for training larger models on H200 clusters
- Optimized Performance: Topologies optimized for H200 hardware
Full LoRA Support for Policy Gradient
LoRA integrated end-to-end into Policy Gradient training flows.- Parameter Efficiency: Low-Rank Adaptation for efficient fine-tuning
- End-to-End: Complete integration from training to inference
- Policy Gradient: Full support for RL training with LoRA adapters
Pipelined RL Async Rollouts
Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.- Asynchronous Processing: Parallel rollout processing for faster training
- Importance Sampling: Proper importance sampling adjustments for stable updates
- Throughput Improvement: Significant improvement in training throughput
Use Cases
- Code Generation: Fine-tune Qwen Coder models for code generation tasks
- High-Throughput Training: Use Turso for concurrent training runs
- Large Model Training: Train larger models with H200 topologies
- Efficient RL: Use LoRA for parameter-efficient RL training