Fix buffer dtype mismatch and causal mask recompute in export by mergennachin · Pull Request #18380 · pytorch/executorch

mergennachin · 2026-03-20T20:28:46Z

Materialize state buffers as bf16 (not fp32) to match compute dtype,
fixing "Expected bfloat16 inputs" error in Triton SDPA during AOTI
lowering
Use hasattr(layer.attn, "mask") instead of isinstance(layer.attn,
FullAttention) for causal mask recompute — the isinstance check fails
when the module is imported via different Python paths
(executorch.examples... vs examples...)
Remove unused FullAttention import from export.py
Fix model directory name in README examples (Qwen3.5-MoE-A3B →
Qwen3.5-35B-A3B)

- Materialize state buffers as bf16 (not fp32) to match compute dtype, fixing "Expected bfloat16 inputs" error in Triton SDPA during AOTI lowering - Use hasattr(layer.attn, "mask") instead of isinstance(layer.attn, FullAttention) for causal mask recompute — the isinstance check fails when the module is imported via different Python paths (executorch.examples... vs examples...) - Remove unused FullAttention import from export.py - Fix model directory name in README examples (Qwen3.5-MoE-A3B → Qwen3.5-35B-A3B)

pytorch-bot · 2026-03-20T20:28:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18380

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 1 Awaiting Approval, 101 Pending

As of commit f922e70 with merge base 1cbc24c ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

Claude Code (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-20T20:29:30Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR updates the Qwen3.5-MoE export flow to avoid dtype mismatches during AOTI/CUDA lowering and to make causal mask recomputation robust to modules imported from different Python paths.

Changes:

Materialize meta-device state buffers as bf16 (keeping masks bool) to match compute dtype and avoid Triton SDPA dtype errors.
Recompute causal masks based on hasattr(layer.attn, "mask") instead of isinstance(..., FullAttention) to avoid class-identity mismatches across import paths.
Documentation cleanup: fix the example model directory name and remove an unused import.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
examples/models/qwen3_5_moe/export.py	Adjusts meta-buffer materialization dtype and makes causal mask recompute detection import-path-agnostic.
examples/models/qwen3_5_moe/README.md	Updates example model directory/tokenizer paths to match the referenced HF model name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T20:31:20Z

examples/models/qwen3_5_moe/export.py

+    # State buffers (KV cache, conv/recurrent state) are bf16 to match
+    # compute dtype. Masks stay bool, inv_freq stays float32.
    for fqn, buf in list(model.named_buffers()):
        if buf.device.type == "meta":
+            dtype = torch.bfloat16 if buf.dtype != torch.bool else torch.bool
            parts = fqn.rsplit(".", 1)
            parent = model.get_submodule(parts[0]) if len(parts) > 1 else model
            parent.register_buffer(
                parts[-1],
-                torch.zeros(buf.shape, dtype=buf.dtype, device="cpu"),
+                torch.zeros(buf.shape, dtype=dtype, device="cpu"),
            )


The comment says “inv_freq stays float32”, but the materialization loop currently casts every meta buffer that isn’t bool to bf16. That means the meta placeholder for rotary_emb.inv_freq will be bf16 until the later recompute step overwrites it, which is a bit inconsistent and could break if the recompute logic changes. Consider explicitly preserving float32 for inv_freq (e.g., by checking the buffer name) or defaulting to the original buf.dtype except for known state buffers that must be bf16.

mergennachin requested a review from lucylq as a code owner March 20, 2026 20:28

Copilot AI review requested due to automatic review settings March 20, 2026 20:28

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2026

Copilot started reviewing on behalf of mergennachin March 20, 2026 20:29 View session

mergennachin requested a review from digantdesai March 20, 2026 20:30

Copilot AI reviewed Mar 20, 2026

View reviewed changes

digantdesai approved these changes Mar 20, 2026

View reviewed changes

mergennachin merged commit 07b7c7e into main Mar 20, 2026
146 of 150 checks passed

mergennachin deleted the mnachin/qwen3_5_moe branch March 20, 2026 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix buffer dtype mismatch and causal mask recompute in export#18380

Fix buffer dtype mismatch and causal mask recompute in export#18380
mergennachin merged 1 commit intomainfrom
mnachin/qwen3_5_moe

mergennachin commented Mar 20, 2026

Uh oh!

pytorch-bot bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mergennachin commented Mar 20, 2026

Uh oh!

pytorch-bot bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18380

⚠️ 1 Awaiting Approval, 101 Pending

Uh oh!

github-actions bot commented Mar 20, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Mar 20, 2026 •

edited

Loading

This PR needs a `release notes:` label