Skip to main content

TL;DR

  • Qwen Coder Models: Coder variants now supported across SFT and inference workflows
  • Turso Migration: SDK migrated to Turso for improved concurrency and throughput
  • H200 Topologies: More training topologies on H200s with additional layouts
  • LoRA Support: Full LoRA support for Policy Gradient training
  • Pipelined RL: Improved throughput via asynchronous rollouts

Qwen Coder Models

Qwen Coder variants are now available across SFT and inference workflows.
  • Full Support: All Qwen Coder models supported for fine-tuning and inference
  • Code Generation: Optimized for code generation and completion tasks
  • Workflow Integration: Seamless integration with existing SFT and inference pipelines

SDK Migrated to Turso

Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.

Benefits

  • Concurrent Writes: Reliable concurrent writes without locking conflicts
  • Higher Throughput: Improved performance in multi-process runs
  • Local-First: Local-first database replication for development
  • Scalability: Better scalability for high-throughput workloads

More Training Topologies on H200s

Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
  • Flexible Layouts: More options for distributing models across GPUs
  • Larger Models: Support for training larger models on H200 clusters
  • Optimized Performance: Topologies optimized for H200 hardware

Full LoRA Support for Policy Gradient

LoRA integrated end-to-end into Policy Gradient training flows.
  • Parameter Efficiency: Low-Rank Adaptation for efficient fine-tuning
  • End-to-End: Complete integration from training to inference
  • Policy Gradient: Full support for RL training with LoRA adapters

Pipelined RL Async Rollouts

Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.
  • Asynchronous Processing: Parallel rollout processing for faster training
  • Importance Sampling: Proper importance sampling adjustments for stable updates
  • Throughput Improvement: Significant improvement in training throughput

Use Cases

  • Code Generation: Fine-tune Qwen Coder models for code generation tasks
  • High-Throughput Training: Use Turso for concurrent training runs
  • Large Model Training: Train larger models with H200 topologies
  • Efficient RL: Use LoRA for parameter-efficient RL training