How do I install trl-training?

trl-training is a Skill hosted on GitHub at https://github.com/huggingface/skills. Visit the ImAiFox page at https://imaifox.com/boosters/huggingface-skills-trl-training for the AI-ready install prompt you can copy directly into Claude Code, Cursor, or Windsurf.

How popular is trl-training?

trl-training has 10,702 GitHub stars and 703 forks. It is actively maintained with recent commits.

Is trl-training free?

Yes — trl-training is open source and free to use under the Apache-2.0 license. The source code is publicly available on GitHub at https://github.com/huggingface/skills.

Skill

trl-training

Name: trl-training
Author: huggingface

by huggingface

AI Summary

You are an expert at using the TRL (Transformers Reinforcement Learning) library to train and fine-tune large language models. TRL provides CLI commands for post-training foundation models using state-of-the-art techniques: TRL is built on top of Hugging Face Transformers and Accelerate, providing s

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "trl-training" skill in my project.

Please run this command in my terminal:
# Install skill into your project
mkdir -p .claude/skills/trl-training && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/trl-training/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/trl-training/SKILL.md"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Train and fine-tune transformer language models using TRL (Transformers Reinforcement Learning). Supports SFT, DPO, GRPO, KTO, RLOO and Reward Model training via CLI commands.

Overview

TRL provides CLI commands for post-training foundation models using state-of-the-art techniques: • SFT (Supervised Fine-Tuning): Fine-tune models on instruction-following or conversational datasets • DPO (Direct Preference Optimization): Align models using preference data • GRPO (Group Relative Policy Optimization): Train models by ranking multiple sampled outputs relative to each other and optimizing based on their comparative rewards. • RLOO (Reinforce Leave One Out): Online RL training with generation-based rewards • Reward Model Training: Train reward models for RLHF TRL is built on top of Hugging Face Transformers and Accelerate, providing seamless integration with the Hugging Face ecosystem.

TRL Training Skill

You are an expert at using the TRL (Transformers Reinforcement Learning) library to train and fine-tune large language models.

trl sft - Supervised Fine-Tuning

Fine-tune language models on instruction-following or conversational datasets. Full training: `bash trl sft \ --model_name_or_path Qwen/Qwen2-0.5B \ --dataset_name trl-lib/Capybara \ --learning_rate 2.0e-5 \ --num_train_epochs 1 \ --packing \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --eos_token '<|im_end|>' \ --eval_strategy steps \ --eval_steps 100 \ --output_dir Qwen2-0.5B-SFT \ --push_to_hub ` Train with LoRA adapters: `bash trl sft \ --model_name_or_path Qwen/Qwen2-0.5B \ --dataset_name trl-lib/Capybara \ --learning_rate 2.0e-4 \ --num_train_epochs 1 \ --packing \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --eos_token '<|im_end|>' \ --eval_strategy steps \ --eval_steps 100 \ --use_peft \ --lora_r 32 \ --lora_alpha 16 \ --output_dir Qwen2-0.5B-SFT \ --push_to_hub `

trl dpo - Direct Preference Optimization

Align models using preference data (chosen/rejected pairs). Full training: `bash trl dpo \ --dataset_name trl-lib/ultrafeedback_binarized \ --model_name_or_path Qwen/Qwen2-0.5B-Instruct \ --learning_rate 5.0e-7 \ --num_train_epochs 1 \ --per_device_train_batch_size 2 \ --max_steps 1000 \ --gradient_accumulation_steps 8 \ --eval_strategy steps \ --eval_steps 50 \ --output_dir Qwen2-0.5B-DPO \ --no_remove_unused_columns ` Train with LoRA adapters: `bash trl dpo \ --dataset_name trl-lib/ultrafeedback_binarized \ --model_name_or_path Qwen/Qwen2-0.5B-Instruct \ --learning_rate 5.0e-6 \ --num_train_epochs 1 \ --per_device_train_batch_size 2 \ --max_steps 1000 \ --gradient_accumulation_steps 8 \ --eval_strategy steps \ --eval_steps 50 \ --output_dir Qwen2-0.5B-DPO \ --no_remove_unused_columns \ --use_peft \ --lora_r 32 \ --lora_alpha 16 `

Discussion

0/2000

Loading comments...

Health Signals

MaintenanceCommitted Yesterday

● Active

Adoption1K+ stars on GitHub

10.7k ★ · Popular

DocsREADME + description

Well-documented

GitHub Signals

Stars10.7k

Forks703

Issues28

UpdatedYesterday

View on GitHub

Apache-2.0 License

My Fox Den

Community Rating

Works With

Claude Code

Related Skills

ecc

Plugin

Everything Claude Code

Plugin

Everything Claude Code

Plugin

claude-api

Skill

View all Skills →