Supported GPUs

GPU	Inference	SFT Training	RL Training	Notes
A10G	1x	1x	1x	Small models (≤4B), experiments
L40S	1x	1x	1x	Good balance of performance and cost
A100 (40GB)	1x, 4x	1x, 4x	1x, 4x	40GB for models up to 7B
H100 (80GB)	1x, 2x, 4x	1x, 2x, 4x, 8x	1x, 2x, 4x, 8x	RDMA-enabled for multi-node training
H200	1x	1x	1x	Latest generation with improved performance
B200	1x, 4x, 8x	1x, 4x	1x, 4x	Flagship GPU with advanced capabilities

Topology Notes:

Inference: Tensor-parallel configurations for large model serving
SFT Training: Data-parallel training across multiple GPUs
RL Training: Supports single-node split (inference + training) and multi-node topologies
RDMA: Available on H100 for high-performance multi-node RL training
Split Mode: RL can partition GPUs between inference (vLLM) and training processes