Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
cff-version: 1.2.0
title: 'TRL: Transformers Reinforcement Learning'
title: 'TRL: Transformer Reinforcement Learning'
message: >-
If you use this software, please cite it using the
metadata from this file.
Expand All @@ -25,7 +25,7 @@ authors:
family-names: Gallouédec
repository-code: 'https://github.com/huggingface/trl'
abstract: >-
TRL (Transformers Reinforcement Learning) is an
TRL (Transformer Reinforcement Learning) is an
open-source toolkit for aligning transformer models via
post-training. It provides practical, scalable
implementations of SFT, reward modeling, DPO, and GRPO
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ Read more in the [Experimental docs](https://huggingface.co/docs/trl/experimenta

```bibtex
@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
title = {{TRL: Transformer Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/grpo_agent.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a language model to act as an **agent**. One that learns to reason, interact with external tools, and improve through reinforcement.\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a language model to act as an **agent**. One that learns to reason, interact with external tools, and improve through reinforcement.\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
"- [Official TRL Examples](https://huggingface.co/docs/trl/example_overview) \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/grpo_ministral3_vl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Ministral-3](https://huggingface.co/collections/mistralai/ministral-3).\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Ministral-3](https://huggingface.co/collections/mistralai/ministral-3).\n",
"\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/grpo_qwen3_vl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Qwen3-VL](https://huggingface.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe).\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Qwen3-VL](https://huggingface.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe).\n",
"\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/grpo_rnj_1_instruct.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge large language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use Colab to fine-tune models like [EssentialAI/rnj-1-instruct](https://huggingface.co/collections/EssentialAI/rnj-1).\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge large language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use Colab to fine-tune models like [EssentialAI/rnj-1-instruct](https://huggingface.co/collections/EssentialAI/rnj-1).\n",
"\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/grpo_trl_lora_qlora.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"source": [
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"Easily fine-tune **Large Language Models (LLMs)** or **Vision-Language Models (VLMs)** with **LoRA** or **QLoRA** using the [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl) library by Hugging Face and Group Relative Policy Optimization (GRPO) — all within a **free Google Colab notebook** powered by a **T4 GPU**.\n",
"Easily fine-tune **Large Language Models (LLMs)** or **Vision-Language Models (VLMs)** with **LoRA** or **QLoRA** using the [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl) library by Hugging Face and Group Relative Policy Optimization (GRPO) — all within a **free Google Colab notebook** powered by a **T4 GPU**.\n",
"\n",
"Thanks to the **built-in memory and training optimizations in TRL**, including LoRA, quantization, gradient checkpointing, and optimized attention kernels, it is possible to **fine-tune a 7B model on a free T4** with a **~7× reduction in memory consumption** compared to naive FP16 training.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/openenv_sudoku_grpo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"\n",
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a model that learns to **play Sudoku**, through interaction and reinforcement.\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a model that learns to **play Sudoku**, through interaction and reinforcement.\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
"- [Official TRL Examples](https://huggingface.co/docs/trl/example_overview) \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/openenv_wordle_grpo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)\n",
"\n",
"\n",
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a model that learns to **play Wordle**, a word-guessing game, through interaction and reinforcement.\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can train a model that learns to **play Wordle**, a word-guessing game, through interaction and reinforcement.\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
"- [Official TRL Examples](https://huggingface.co/docs/trl/example_overview) \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/sft_ministral3_vl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"id": "4f0hzSo4kKEc"
},
"source": [
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Ministral-3](https://huggingface.co/collections/mistralai/ministral-3).\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Ministral-3](https://huggingface.co/collections/mistralai/ministral-3).\n",
"\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/sft_qwen_vl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"id": "4f0hzSo4kKEc"
},
"source": [
"With [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Qwen3-VL](https://huggingface.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe).\n",
"With [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl), you can fine-tune cutting edge vision language models. It comes with support for quantized parameter efficient fine-tuning technique **QLoRA**, so we can use free Colab (T4 GPU) to fine-tune models like [Qwen3-VL](https://huggingface.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe).\n",
"\n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/sft_trl_lora_qlora.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"id": "cQ6bxQaMLWAS"
},
"source": [
"Easily fine-tune Large Language Models (LLMs) or Vision-Language Models (VLMs) with **LoRA** or **QLoRA** using the [**Transformers Reinforcement Learning (TRL)**](https://github.com/huggingface/trl) library built by Hugging Face — all within a **free Google Colab notebook** (powered by a **T4 GPU**.). \n",
"Easily fine-tune Large Language Models (LLMs) or Vision-Language Models (VLMs) with **LoRA** or **QLoRA** using the [**Transformer Reinforcement Learning (TRL)**](https://github.com/huggingface/trl) library built by Hugging Face — all within a **free Google Colab notebook** (powered by a **T4 GPU**.). \n",
"\n",
"- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project! \n",
"- [Official TRL Examples](https://huggingface.co/docs/trl/example_overview) \n",
Expand Down
4 changes: 2 additions & 2 deletions trl/skills/trl-training/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: trl-training
description: Train and fine-tune transformer language models using TRL (Transformers Reinforcement Learning). Supports SFT, DPO, GRPO, KTO, RLOO and Reward Model training via CLI commands.
description: Train and fine-tune transformer language models using TRL (Transformer Reinforcement Learning). Supports SFT, DPO, GRPO, KTO, RLOO and Reward Model training via CLI commands.
license: Apache-2.0
metadata:
version: "1.0.0"
Expand Down Expand Up @@ -28,7 +28,7 @@ metadata:

# TRL Training Skill

You are an expert at using the TRL (Transformers Reinforcement Learning) library to train and fine-tune large language models.
You are an expert at using the TRL (Transformer Reinforcement Learning) library to train and fine-tune large language models.

## Overview

Expand Down
2 changes: 1 addition & 1 deletion trl/templates/lm_model_card.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Cite TRL as:

```bibtex
{% raw %}@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
title = {{TRL: Transformer Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
Expand Down
2 changes: 1 addition & 1 deletion trl/templates/rm_model_card.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Cite TRL as:

```bibtex
{% raw %}@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
title = {{TRL: Transformer Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
Expand Down
Loading