Skip to content

Conversation

@tohtana
Copy link
Collaborator

@tohtana tohtana commented Jan 18, 2026

#7736 fixed an issue with OnebitLamb NaN propagation. With the fix, the optimizer correctly filters out empty parameters, but DeepSpeed engine's gradient allreduce operation (which runs separately from the optimizer) still includes empty parameters' gradients.

This PR addresses the issue by skipping empty parameters (numel=0) in _get_gradients_for_reduction().

Empty parameters (numel=0) cause issues in gradient allreduce when
using flatten/unflatten operations. The unflatten operation fails
with shape mismatches because empty tensors can't be properly
reconstructed from a flattened buffer.

This fix skips empty parameters in _get_gradients_for_reduction()
since they contribute nothing to gradient reduction anyway.

Fixes test_onebit.py::TestOneBitLambEmptyParameters::test

Signed-off-by: Masahiro Tanaka <[email protected]>
@tohtana tohtana requested a review from tjruwase as a code owner January 18, 2026 03:14

# Skip empty parameters (numel=0) as they contribute nothing to gradient reduction
# and cause issues with flatten/unflatten operations
if param.numel() == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tohtana Very clean fix! The only minor comment is that maybe we can add an explicit test for gradient reduction? It's optional though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. Can you elaborate what test you are suggesting? Run only gradient reduction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants