[bugfix]: preserve FSDP hooks for RMSNorm qk norms by macthecadillac · Pull Request #1513 · hao-ai-lab/FastVideo

macthecadillac · 2026-06-29T11:57:36Z

Summary

This replaces the remaining direct q/k RMSNorm forward_native(...) calls with normal module dispatch so FSDP hooks run:

self.norm_q.forward_native(query) -> self.norm_q(query)
self.norm_k.forward_native(key) -> self.norm_k(key)

The remaining production call sites were in causal Wan and MatrixGame2 causal transformer blocks. This keeps the selected RMSNorm implementation unchanged while avoiding bypassing nn.Module.__call__ under FSDP.

Root Cause

Calling RMSNorm.forward_native(...) directly bypasses FSDP2 module hooks. With sharded RMSNorm weights, this can leave self.weight as a DTensor, and the native multiply fails with a mixed Tensor/DTensor error. Calling RMSNorm(x) lets FSDP unwrap/localize parameters as expected.

Issue #1379 Investigation Checklist

Reliable reproducer: added fastvideo/tests/layers/test_rmsnorm_forward_dispatch.py and reproduced on Modal 2x L40S with PyTorch 2.11.0+cu128, FSDP2 fully_shard, and both regular sharding and CPUOffloadPolicy.
Upstream issue determination: no exact matching PyTorch issue was found. The repro shows RMSNorm(x) works while direct RMSNorm.forward_native(x) fails, so this appears to be FastVideo bypassing FSDP module hooks rather than an FSDP-side bug.
Unit test for CPU-offload + DTensor weight path: the new test covers direct_cpu_offload, where norm.weight remains a CPU DTensor and direct forward_native fails with the mixed Tensor/DTensor error; it also verifies module dispatch succeeds with CPU offload.
Behavioral spec: not final-layer-only. The failure can affect any FSDP-sharded RMSNorm(has_weight=True) called directly through forward_native; it reproduces with and without CPU offload.

Tests

uvx pre-commit==4.0.1 run --all-files
- passed after the review follow-up commit
Modal L40S: pytest fastvideo/tests/layers/test_rmsnorm_forward_dispatch.py -vs
- 2x L40S, latest branch includes the 120-second torchrun timeout follow-up
- 5 passed, 14 warnings in 38.28s
Modal L40S: pytest fastvideo/tests/transformers/test_wanvideo.py -vs
- 1x L40S, production fix applied
- 1 passed, 14 warnings in 50.05s
Modal L40S targeted Wan T2V SSIM:
- FASTVIDEO_SSIM_MODEL_ID=Wan2.1-T2V-1.3B-Diffusers pytest fastvideo/tests/ssim/test_wan_t2v_similarity.py -vs
- 2x L40S, commit 75e1659310ae306d342b6bfcd0e8ef7ebb0b5caa
- 2 passed, 18 warnings in 123.41s
- FLASH_ATTN-parametrized mean SSIM: 0.976557461420695
- TORCH_SDPA mean SSIM: 0.9821627881791857
- Note: the Modal image's FlashAttention import failed and the FLASH_ATTN-parametrized run fell back to Torch SDPA.

Checklist

I ran pre-commit run --all-files and fixed all issues
I added or updated tests for my changes
I updated documentation if needed
I considered GPU memory impact of my changes

For model/pipeline changes, also check:

I verified targeted Wan T2V SSIM regression tests pass on L40S
I updated the support matrix if adding a new model

Documentation/support-matrix notes: no docs or support matrix update was needed; this does not add a model or user-facing API. GPU memory impact should be neutral because this only routes existing RMSNorm calls through module dispatch so FSDP hooks can run.

mergify · 2026-06-29T11:58:16Z

Merge Protections

🔴 1 of 1 protections blocking · waiting on 🤖 CI

	Protection	Waiting on
🔴	PR merge requirements	🤖 CI

🔴 PR merge requirements

Waiting for

check-success=full-suite-passed
check-success~=pre-commit

This rule is failing.

check-success=full-suite-passed
check-success~=pre-commit
#approved-reviews-by>=1
check-success=fastcheck-passed
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

gemini-code-assist

Code Review

This pull request replaces direct calls to .forward_native() on RMSNorm layers with standard module calls in causal_wanvideo.py and causal_model.py to prevent bypassing FSDP hooks. It also introduces a new test suite to verify this behavior under FSDP. The feedback suggests adding a timeout to the subprocess.run call when executing torchrun in the tests to prevent potential deadlocks or hangs in CI/CD pipelines.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

SolitaryThinker

Reviewed the production dispatch changes and regression coverage. One inline correctness finding; CI status intentionally excluded.

SolitaryThinker · 2026-06-30T19:06:42Z

/merge

[bugfix]: preserve FSDP hooks for RMSNorm qk norms (hao-ai-lab#1379)

75e1659

mergify Bot added type: bugfix Bug fix scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) labels Jun 29, 2026

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread fastvideo/tests/layers/test_rmsnorm_forward_dispatch.py Outdated

[bugfix]: bound RMSNorm FSDP test torchrun timeout

3f5c6eb

macthecadillac marked this pull request as ready for review June 29, 2026 12:38

SolitaryThinker reviewed Jun 30, 2026

View reviewed changes

Comment thread fastvideo/tests/layers/test_rmsnorm_forward_dispatch.py

SolitaryThinker approved these changes Jun 30, 2026

View reviewed changes

github-actions Bot added the ready PR is ready to merge label Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix]: preserve FSDP hooks for RMSNorm qk norms#1513

[bugfix]: preserve FSDP hooks for RMSNorm qk norms#1513
macthecadillac wants to merge 2 commits into
hao-ai-lab:mainfrom
macthecadillac:investigate/rmsnorm-fsdp-forward-native

macthecadillac commented Jun 29, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

SolitaryThinker left a comment

Uh oh!

Uh oh!

SolitaryThinker commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

macthecadillac commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Issue #1379 Investigation Checklist

Tests

Checklist

Uh oh!

mergify Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SolitaryThinker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SolitaryThinker commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

macthecadillac commented Jun 29, 2026 •

edited

Loading

mergify Bot commented Jun 29, 2026 •

edited

Loading