Skip to content

Actions: huggingface/trl

Actions

Hugging Face Issue Labeler

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
725 workflow runs
725 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

why kl = nan when grpo train?
Hugging Face Issue Labeler #676: Issue #4040 opened by uilstong
30s
About "or None" and "defaults to None"
Hugging Face Issue Labeler #675: Issue #4036 opened by qgallouedec
22s
SFTTrainer with PEFT model
Hugging Face Issue Labeler #673: Issue #4029 opened by lylaiyy
39s
Abnormal results during DPO training
Hugging Face Issue Labeler #672: Issue #4023 opened by wjjwyj
28s
No warning for unsupported int4 quantization
Hugging Face Issue Labeler #670: Issue #4018 opened by MRiabov
39s
Training Step of GRPO in Wandb.
Hugging Face Issue Labeler #665: Issue #4004 opened by mandyyyyii
25s
DPO trainer with video content
Hugging Face Issue Labeler #664: Issue #4002 opened by GabrieleGiudic
25s
scale_rewards malfunctioned in GRPOTrainer
Hugging Face Issue Labeler #663: Issue #3991 opened by Peter-Chou
25s
accelerator.sync_gradients
Hugging Face Issue Labeler #662: Issue #3988 opened by AriesJin
28s
kto trainer invalid configuration error
Hugging Face Issue Labeler #656: Issue #3974 opened by bryanchrist
29s
REQUEST: Dynamic Sampling for GRPO
Hugging Face Issue Labeler #655: Issue #3973 opened by wenquanlu
23s
[Question] Why isn't vanilla REINFORCE implemented?
Hugging Face Issue Labeler #653: Issue #3966 opened by nityadav
21s
[QST] Colocation and resharding
Hugging Face Issue Labeler #652: Issue #3963 opened by jeromeku
26s