Skip to content

Refactor standard deviation calculation in GRPOTrainer to use nanstd …

87782d9
Select commit
Loading
Failed to load commit list.
Open

Add support for DGPO (ICLR 2026) to GRPO #5102

Refactor standard deviation calculation in GRPOTrainer to use nanstd …
87782d9
Select commit
Loading
Failed to load commit list.