Qwen3.5 Support (e.g., Qwen3.5-35B-A3B)
Question
I'm trying to use slime for RL training on Qwen3.5-35B-A3B and would like to check the current support status.
I noticed that slime already has scripts for Qwen3-30B-A3B (scripts/run_qwen3_30b_a3b.py), and since Qwen3.5-35B-A3B shares a similar MoE architecture (also with ~3B active parameters per token), I'm wondering whether it's already supported or if there are known compatibility issues.
Background
Qwen3.5 introduces some architectural changes compared to Qwen3, most notably:
- Gated Delta Networks (a linear attention variant) interleaved with standard attention
- Updated MoE routing
- Native multimodal (early-fusion vision-language)
These differences may affect compatibility at the SGLang rollout layer and/or the Megatron-LM training layer.
Specific Questions
- Is Qwen3.5 (e.g.,
Qwen3.5-35B-A3B) currently supported, or is it on the roadmap?
- Is it safe to reuse
scripts/run_qwen3_30b_a3b.py with a Qwen3.5 checkpoint by simply swapping the --hf-checkpoint path, or are additional config changes required?
- Does the current SGLang version bundled with slime support Qwen3.5's hybrid attention (Gated Delta Networks)?
- Any known issues or workarounds for running Qwen3.5 MoE models?
Environment
- slime version: latest main
- Model:
Qwen/Qwen3.5-35B-A3B
- GPU: NVIDIA (CUDA)
Thanks for the great work on slime! Happy to test and provide feedback if Qwen3.5 support is in progress.
What I've Tried
Null
Environment (if relevant)
- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
Additional Context
No response
Pre-submission Checklist
Qwen3.5 Support (e.g., Qwen3.5-35B-A3B)
Question
I'm trying to use slime for RL training on Qwen3.5-35B-A3B and would like to check the current support status.
I noticed that slime already has scripts for Qwen3-30B-A3B (
scripts/run_qwen3_30b_a3b.py), and since Qwen3.5-35B-A3B shares a similar MoE architecture (also with ~3B active parameters per token), I'm wondering whether it's already supported or if there are known compatibility issues.Background
Qwen3.5 introduces some architectural changes compared to Qwen3, most notably:
These differences may affect compatibility at the SGLang rollout layer and/or the Megatron-LM training layer.
Specific Questions
Qwen3.5-35B-A3B) currently supported, or is it on the roadmap?scripts/run_qwen3_30b_a3b.pywith a Qwen3.5 checkpoint by simply swapping the--hf-checkpointpath, or are additional config changes required?Environment
Qwen/Qwen3.5-35B-A3BThanks for the great work on slime! Happy to test and provide feedback if Qwen3.5 support is in progress.
What I've Tried
Null
Environment (if relevant)
Additional Context
No response
Pre-submission Checklist