Skip to main content
GPUInferenceSFT TrainingRL TrainingNotes
A10G1x1x1xSmall models (≤4B), experiments
L40S1x1x1xGood balance of performance and cost
A100 (40GB)1x, 4x1x, 4x1x, 4x40GB for models up to 7B
H100 (80GB)1x, 2x, 4x1x, 2x, 4x, 8x1x, 2x, 4x, 8xRDMA-enabled for multi-node training
H2001x1x1xLatest generation with improved performance
B2001x, 4x, 8x1x, 4x1x, 4xFlagship GPU with advanced capabilities
Topology Notes:
  • Inference: Tensor-parallel configurations for large model serving
  • SFT Training: Data-parallel training across multiple GPUs
  • RL Training: Supports single-node split (inference + training) and multi-node topologies
  • RDMA: Available on H100 for high-performance multi-node RL training
  • Split Mode: RL can partition GPUs between inference (vLLM) and training processes
I