Skip to content

[Question] Qwen3.5 Support (e.g., Qwen3.5-35B-A3B) #1831

@Yanbin-Yin

Description

@Yanbin-Yin

Qwen3.5 Support (e.g., Qwen3.5-35B-A3B)

Question

I'm trying to use slime for RL training on Qwen3.5-35B-A3B and would like to check the current support status.

I noticed that slime already has scripts for Qwen3-30B-A3B (scripts/run_qwen3_30b_a3b.py), and since Qwen3.5-35B-A3B shares a similar MoE architecture (also with ~3B active parameters per token), I'm wondering whether it's already supported or if there are known compatibility issues.

Background

Qwen3.5 introduces some architectural changes compared to Qwen3, most notably:

  • Gated Delta Networks (a linear attention variant) interleaved with standard attention
  • Updated MoE routing
  • Native multimodal (early-fusion vision-language)

These differences may affect compatibility at the SGLang rollout layer and/or the Megatron-LM training layer.

Specific Questions

  1. Is Qwen3.5 (e.g., Qwen3.5-35B-A3B) currently supported, or is it on the roadmap?
  2. Is it safe to reuse scripts/run_qwen3_30b_a3b.py with a Qwen3.5 checkpoint by simply swapping the --hf-checkpoint path, or are additional config changes required?
  3. Does the current SGLang version bundled with slime support Qwen3.5's hybrid attention (Gated Delta Networks)?
  4. Any known issues or workarounds for running Qwen3.5 MoE models?

Environment

  • slime version: latest main
  • Model: Qwen/Qwen3.5-35B-A3B
  • GPU: NVIDIA (CUDA)

Thanks for the great work on slime! Happy to test and provide feedback if Qwen3.5 support is in progress.

What I've Tried

Null

Environment (if relevant)

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:

Additional Context

No response

Pre-submission Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions