Skip to main content
ProviderModelsNotes
Qwen 30.6B, 1.7B, 4B, 8B, 14B, 32BDefault for demos; supports tool-calling.
Qwen 3 (Advanced)4B-2507, 30B-A3B, 235B-A22B, 480B-A35BInstruct/Thinking/MoE variants; 235B/480B require sharding.
Qwen 3 Coder30B-A3B, 480B-A35BCode-specialized; large sizes require sharding.
Large models (235B/480B) must be sharded across multiple GPUs for inference and training.