generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO
#5188
opened Feb 26, 2026 by
lfranceschetti
Loading…
2
Fix title consistency from "Transformers Reinforcement Learning" to "Transformer Reinforcement Learning"
#5183
opened Feb 26, 2026 by
qgallouedec
Loading…
5 tasks
Fix GRPO tool mask alignment after tool-call retokenization
#5145
opened Feb 21, 2026 by
MichalMraz
Loading…
Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn
#5122
opened Feb 18, 2026 by
albertvillanova
Loading…
feat(experimental): Divergence Proximal Policy Optimization
#5117
opened Feb 17, 2026 by
LeonEricsson
Loading…
5 tasks
Add prefix-preserving training chat template for GPT-OSS
#5109
opened Feb 17, 2026 by
qgallouedec
Loading…
Add support for DGPO (ICLR 2026) to GRPO
#5102
opened Feb 15, 2026 by
YanqiDai
Loading…
5 tasks done
Cast multimodal forward_kwargs to compute dtype for bf16/fp16 training
#5073
opened Feb 11, 2026 by
akshan-main
Loading…
4 of 5 tasks
Fix GRPO VLM prompt handling for string prompts
#5064
opened Feb 10, 2026 by
akshan-main
Loading…
5 tasks done
fix: add gradient checkpointing to PolicyAndValueWrapper
#4955
opened Feb 3, 2026 by
lvhungdev
Loading…
3 of 5 tasks
[Experimental] Add SDFT trainer, config, docs, and tests
#4941
opened Jan 31, 2026 by
Shekswess
Loading…
4 of 5 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.