Supported models
Any Qwen3-VL checkpoint (2B–235B) works with the RL stack. The registry descriptor (backend/app/routes/simple_training/model_families/qwen3_vl.py) adds:
- supports_vision = true
- max_images_per_message = 1
- LoRA projector targets (mm_projector, attention/MLP layers)
backend/app/routes/clustered_training/core/algorithms/gspo/app_helpers.py) when you pick a VL model.
Task app requirements
Use the Crafter policy (examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py) as a template:
- Detect VL models via model_nameand setuse_vision = True.
- Include the observation image as a data URL (or HTTPS URL) inside the user message:
- Support image_only_modeto send image segments without accompanying text when desired.
image_url segment (backend/app/routes/clustered_training/core/algorithms/gspo/inference/server.py), so ensure the URL is present and fetchable.
Config checklist
Thinking variants
If you choose a-Thinking SKU, populate the rollout policy_config with the intended thinking mode:
backend/app/routes/clustered_training/core/algorithms/gspo/evaluation/evaluator.py).
Example workflow
- Deploy the Crafter task app (modal deploy examples/task_apps/crafter/task_app/main.py) with vision enabled.
- Update examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.tomlwith your task URL and API key secrets.
- Launch RL:
- Monitor rollouts – the trainer logs dropped images if you exceed max_images_per_message, and vLLM reports multimodal prompt usage.
- Evaluate / deploy – reuse the same [model]+[rollout]blocks in your eval configs and Modal deployment manifests so the processor files ship with the model.
Tips
- Concurrency: Vision prompts are larger. Start with max_concurrent_rollouts = 4and scale cautiously.
- Topology: Use single_node_splitand dedicate at least one GPU to vLLM and one to training; sharded models (235B) require additional GPUs.
- Data capture: Enable tracing (TASKAPP_TRACING_ENABLED=1) to keep image payloads in your evaluation logs.
- LoRA projector weights: When using LoRA, ensure target_modulesincludes the projector (the sample config uses"all-linear"to cover every linear module).