-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Pull requests: deepspeedai/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights
#7792
opened Jan 18, 2026 by
juyterman1000
Loading…
Add bf16 model with fp32 grad_accum to supported configs
#7790
opened Jan 18, 2026 by
tohtana
Loading…
Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2
#7776
opened Jan 12, 2026 by
fy817
Loading…
Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759)
#7771
opened Jan 11, 2026 by
Antlera
Loading…
Introduce all_reduce_hook to support gradient aggregation across replica groups.
#7764
opened Jan 7, 2026 by
zhengchenyu
Loading…
feat: add parameter-level precision control for BF16 training
#7750
opened Dec 30, 2025 by
nathon-lee
Loading…
Fix Muon optimizer checkpoint resume with bf16 mode
#7748
opened Dec 28, 2025 by
yurekami
Loading…
2 tasks done
Introduce Megatron-style parallel state management
#7726
opened Dec 15, 2025 by
eternalNight
•
Draft
1 of 5 tasks
let allgather and alltoall execute in parallel when both attention and MOE used TP
#7723
opened Dec 11, 2025 by
taozhiwei
Loading…
HF2UCP: Converting a
pytorch_model.bin or .safetensors checkpoint to UCP
#7212
opened Apr 10, 2025 by
Schwidola0607
Loading…
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)
#7130
opened Mar 11, 2025 by
cyr0930
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.