Hugging Face Issue Labeler · Workflow runs · huggingface/trl · GitHub

Actions

All workflows
Workflows
- Build TRL Docker image Build TRL Docker image
- Tests Tests
- Tests latest TRL release with dev dependencies Tests latest TRL release with dev dependencies
- Automatic Dependency Submission Automatic Dependency Submission
- Build documentation Build documentation
- Build PR Documentation Build PR Documentation
- Cleanup Cache Cleanup Cache
- CodeQL Analysis - Workflows CodeQL Analysis - Workflows
- Env Env
- Hugging Face Issue Labeler Hugging Face Issue Labeler
Management
- Caches
- Deployments

Hugging Face Issue Labeler

Actions

Loading...
Loading

issue_auto_labeller.yml

725 workflow runs

725 workflow runs

GRPOTrainer training causes model to completely deteriorate when training with accelerate Hugging Face Issue Labeler #626: Issue #3881 opened by Randomdude11

24s

24s

truncate mode for response Hugging Face Issue Labeler #625: Issue #3878 opened by shiwanghua

23s

23s

Dynamic Fine Tuning, an improvement of SFT Hugging Face Issue Labeler #624: Issue #3877 opened by 1485840691

27s

27s

PermissionError: [Errno 13] Permission denied: 'Qwen3-0.6B-SFT' Hugging Face Issue Labeler #623: Issue #3872 opened by vishaljoshi24

27s

27s

Add reward functions to support RLCR Hugging Face Issue Labeler #622: Issue #3871 opened by pramodith

32s

32s

Unable to replicate official GRPO for VLM tutorial - AttributeError: 'dict' object has no attribute 'replace' Hugging Face Issue Labeler #621: Issue #3870 opened by Randomdude11

26s

26s

ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. Hugging Face Issue Labeler #620: Issue #3868 opened by Randomdude11

24s

24s

Validate the vllm_mode is either server or colocate Hugging Face Issue Labeler #619: Issue #3865 opened by sergiopaniego

23s

23s

bf16 option in training config prevents FP8 Training Hugging Face Issue Labeler #618: Issue #3860 opened by akakakakakaa

24s

24s

'LLMEngine' object has no attribute 'model_executor' Hugging Face Issue Labeler #617: Issue #3859 opened by EvilCalf

40s

40s

when use GRPO+ deepspeed_zero3 + ds3_gather_for_generation=False, stuck in the training stage, step is still 0 after an hour Hugging Face Issue Labeler #616: Issue #3858 opened by nstl-zyb

29s

29s

GRPO Trainer loss error? as the info log, why 'rewards/if_rewrite_reward/std': 5.080004692077637 (is not 0), but loss = 0.0, since if std is not 0, the advantages won't be 0.0, get a loss is not 0。 Hugging Face Issue Labeler #615: Issue #3857 opened by harmonytan

25s

25s

vllm prepends two BOS for LLama Hugging Face Issue Labeler #614: Issue #3853 opened by wenquanlu

21s

21s

Issues at GRPO with VLM Hugging Face Issue Labeler #613: Issue #3847 opened by Fhrozen

27s

27s

Ideas to Improve GRPO Training Speed Hugging Face Issue Labeler #612: Issue #3846 opened by jp1924

26s

26s

accelerate reducing the batch size and crashing GRPO Hugging Face Issue Labeler #611: Issue #3842 opened by limlimg

31s

31s

GRPO with google/gemma-3-1b-it torch.compile error Hugging Face Issue Labeler #610: Issue #3839 opened by AdityaKulshrestha

26s

26s

Wrong default clipping params for GSPO Hugging Face Issue Labeler #609: Issue #3834 opened by pramodith

25s

25s

Improve RLOO Trainer memory efficiency through string-level processing optimization Hugging Face Issue Labeler #608: Issue #3829 opened by luckyvickyricky

26s

26s

TRL doesn't support gemma-3 Hugging Face Issue Labeler #607: Issue #3828 opened by awestover

41s

41s

DataCollatorForCompletionOnlyLM Hugging Face Issue Labeler #606: Issue #3827 opened by tejassaboo

21s

21s

RLOOTrainer tldr experiments not reproducible Hugging Face Issue Labeler #605: Issue #3825 opened by jltchiu

31s

31s

Loss not calculated correctly when using GSPO with default loss type bnpo (importance_sampling_level == "sequence") Hugging Face Issue Labeler #604: Issue #3823 opened by avishaiElmakies

26s

26s

Processing class does not have EOS token Hugging Face Issue Labeler #603: Issue #3822 opened by debasisdwivedy

31s

31s

[Question] about Profiling & Logging Execution Time to WANDB using GRPO Hugging Face Issue Labeler #602: Issue #3819 opened by jungle-gym-ac

19s

19s