GPU | Inference | SFT Training | RL Training | Notes |
---|---|---|---|---|
A10G | 1x | 1x | 1x | Small models (≤4B), experiments |
L40S | 1x | 1x | 1x | Good balance of performance and cost |
A100 (40GB) | 1x, 4x | 1x, 4x | 1x, 4x | 40GB for models up to 7B |
H100 (80GB) | 1x, 2x, 4x | 1x, 2x, 4x, 8x | 1x, 2x, 4x, 8x | RDMA-enabled for multi-node training |
H200 | 1x | 1x | 1x | Latest generation with improved performance |
B200 | 1x, 4x, 8x | 1x, 4x | 1x, 4x | Flagship GPU with advanced capabilities |
- Inference: Tensor-parallel configurations for large model serving
- SFT Training: Data-parallel training across multiple GPUs
- RL Training: Supports single-node split (inference + training) and multi-node topologies
- RDMA: Available on H100 for high-performance multi-node RL training
- Split Mode: RL can partition GPUs between inference (vLLM) and training processes