Changelog #0236 – Week of October 14, 2025

TL;DR

Qwen Coder Models: Coder variants now supported across SFT and inference workflows
Turso Migration: SDK migrated to Turso for improved concurrency and throughput
H200 Topologies: More training topologies on H200s with additional layouts
LoRA Support: Full LoRA support for Policy Gradient training
Pipelined RL: Improved throughput via asynchronous rollouts

Qwen Coder variants are now available across SFT and inference workflows.

Full Support: All Qwen Coder models supported for fine-tuning and inference
Code Generation: Optimized for code generation and completion tasks
Workflow Integration: Seamless integration with existing SFT and inference pipelines

Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.

Added configurations for larger models with additional tensor/pipeline/data parallel layouts.

LoRA integrated end-to-end into Policy Gradient training flows.

Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.

Asynchronous Processing: Parallel rollout processing for faster training
Importance Sampling: Proper importance sampling adjustments for stable updates
Throughput Improvement: Significant improvement in training throughput