System Info
In version transformers==4.57.3, and deepspeed==0.18.3,
in below screenshot, when accelerator.backward is called, the deepspeed backward internally calls engine.step which is performing optimizer step at gradient accumulation step
The below snapshot is from trainer.py in transformers library
Also, inside trainer as well optimizer.step is called again post this backward at gradient accumulation step, attaching the below SS for reference.
So inherently in a single iteration it is doing two optimizer step which is wrong. Please update this bug.
Who can help?
No response
Information
Tasks
Reproduction
This bug is currently working as a feature
Expected behavior
There should pe single step, but there are inherently two steps for optimizer which is wrong
System Info
In version transformers==4.57.3, and deepspeed==0.18.3,
in below screenshot, when accelerator.backward is called, the deepspeed backward internally calls engine.step which is performing optimizer step at gradient accumulation step
The below snapshot is from trainer.py in transformers library
Also, inside trainer as well optimizer.step is called again post this backward at gradient accumulation step, attaching the below SS for reference.
So inherently in a single iteration it is doing two optimizer step which is wrong. Please update this bug.
Who can help?
No response
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
This bug is currently working as a feature
Expected behavior
There should pe single step, but there are inherently two steps for optimizer which is wrong