Hugging Face Issue Labeler · Workflow runs · huggingface/trl · GitHub

Actions

All workflows
Workflows
- Build TRL Docker image Build TRL Docker image
- Tests Tests
- Tests latest TRL release with dev dependencies Tests latest TRL release with dev dependencies
- Automatic Dependency Submission Automatic Dependency Submission
- Build documentation Build documentation
- Build PR Documentation Build PR Documentation
- Cleanup Cache Cleanup Cache
- CodeQL Analysis - Workflows CodeQL Analysis - Workflows
- Env Env
- Hugging Face Issue Labeler Hugging Face Issue Labeler
Management
- Caches
- Deployments

Hugging Face Issue Labeler

Actions

Loading...
Loading

issue_auto_labeller.yml

725 workflow runs

725 workflow runs

PPO grad_norm is 0.0 Hugging Face Issue Labeler #651: Issue #3961 opened by faker52

27s

27s

sft_video_llm example fail Hugging Face Issue Labeler #650: Issue #3958 opened by yao-matrix

22s

22s

sft_gemma3 example doesn't work Hugging Face Issue Labeler #649: Issue #3957 opened by yao-matrix

27s

27s

Why not use AutoModel for ref_model in grpo trainer? Hugging Face Issue Labeler #648: Issue #3948 opened by csshihao

30s

30s

LoRA-GA: Low-Rank Adaptation with Gradient Approximation Hugging Face Issue Labeler #647: Issue #3945 opened by electroglyph

28s

28s

Can we avoid saving the optimization stage? Hugging Face Issue Labeler #646: Issue #3944 opened by HelloWorldLTY

22s

22s

Should the reference model of GRPO be an actual model? Hugging Face Issue Labeler #645: Issue #3939 opened by shaojun0

26s

26s

Bugs when sft with mixing pure text and multimodal data Hugging Face Issue Labeler #644: Issue #3934 opened by shiym2000

31s

31s

GRPOTrainer with top_entropy_quntile < 1 causes hang with multi gpu training Hugging Face Issue Labeler #643: Issue #3933 opened by avishaiElmakies

27s

27s

Add support for RLPR Hugging Face Issue Labeler #642: Issue #3928 opened by mitchelldehaven

23s

23s

Using assistant_only_loss=True with sequence length > max_length fails silently Hugging Face Issue Labeler #641: Issue #3927 opened by jonnyli1125

31s

31s

SFTTrainer freezes LoRA adapter when using PEFT model as argument Hugging Face Issue Labeler #640: Issue #3926 opened by dvgodoy

32s

32s

When I want to use client to request trl vllm-serve, it always timeout when init_communicator() Hugging Face Issue Labeler #639: Issue #3925 opened by JackeyZhang1001

28s

28s

Feature request for GRPO trainer: vLLM guided decoding with xgrammar (JSON) Hugging Face Issue Labeler #638: Issue #3924 opened by sanghak123

27s

27s

Does the trl library support the Lite PPO algorithm? Hugging Face Issue Labeler #637: Issue #3920 opened by ArcherShirou

29s

29s

How to use trl-SFTTrainer to train Qwen-30B-A3B? Hugging Face Issue Labeler #636: Issue #3918 opened by JeffWb

23s

23s

Inconsistent apply_chat_template behaviour for multimodal dataset between SFT and GRPO Hugging Face Issue Labeler #635: Issue #3915 opened by ishaan-rawal-ai

36s

36s

max_length causing training loss issues not found in v0.19.1 Hugging Face Issue Labeler #634: Issue #3910 opened by AmazingGabriel16

28s

28s

[GRPO Trainer] Accuracy reward stays 0 Hugging Face Issue Labeler #633: Issue #3903 opened by Revist

25s

25s

sft_gemma3 example fail Hugging Face Issue Labeler #632: Issue #3901 opened by yao-matrix

30s

30s

How to gather completions before computing rewards in GRPOTrainer Hugging Face Issue Labeler #631: Issue #3896 opened by rubickkcibur

28s

28s

What are the important metrics that help gauge accuracy and validity of training? Hugging Face Issue Labeler #630: Issue #3893 opened by RageItalic

28s

28s

Is there a standard GRPO GSM8k process for TRL? Hugging Face Issue Labeler #629: Issue #3892 opened by mitchelldehaven

25s

25s

Bug in BFD packing Hugging Face Issue Labeler #628: Issue #3887 opened by RicardoDominguez

29s

29s

GRPO fails with the "server" vllm_mode when num_processes > 1 Hugging Face Issue Labeler #627: Issue #3885 opened by Kirill-Kravtsov

22s

22s