Skip to content

Conversation

BenjaminBossan
Copy link
Member

It appears that the single GPU tests are always failing at this test ("The operation was canceled"), probably because it is hanging (after more than 5h). Let's try to debug by skipping this test.

Moreover, remove a superfluous step in the BNB CI workflow (correct step is defined further below).

It appears that the single GPU tests are always failing at this
test ("The operation was canceled"), probably because it is
hanging (after more than 5h). Let's try to debug by skipping this test.

Morevoer, remove a superfluous step in the CI workflow.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan BenjaminBossan merged commit e8babb1 into huggingface:main Feb 18, 2025
15 checks passed
@BenjaminBossan BenjaminBossan deleted the ci-testing-skip-audio-test branch February 18, 2025 16:30
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Feb 19, 2025
After unblocking single GPU tests with huggingface#2380, a couple of tests related
to hotswapping failed. This PR should (hopefully) address this.

1. Wrong error type caught with xfail

I set the wrong error type for the xfailing compiled hotswap diffusers
tests. This was because I hadn't checked out diffusers main when I was
checking locally.

2. Loosen tolerance

Some tests fail because an allclose does not match even though the
numbers in the logs look pretty much identical:

https://github.com/huggingface/peft/actions/runs/13404117333/job/37440752790#step:6:1929

This is most likely a problem with tolerances being too strict.
Unfortunately, I can't reproduce the error locally, so I have to guess
that moving from 1e-5 to 1e-4 will the issue.
BenjaminBossan added a commit that referenced this pull request Feb 19, 2025
After unblocking single GPU tests with #2380, a couple of tests related
to hotswapping failed. This PR should (hopefully) address this.

1. Wrong error type caught with xfail

I set the wrong error type for the xfailing compiled hotswap diffusers
tests. This was because I hadn't checked out diffusers main when I was
checking locally.

2. Loosen tolerance

Some tests fail because an allclose does not match even though the
numbers in the logs look pretty much identical:

https://github.com/huggingface/peft/actions/runs/13404117333/job/37440752790#step:6:1929

This is most likely a problem with tolerances being too strict.
Unfortunately, I can't reproduce the error locally, so I have to guess
that moving from 1e-5 to 1e-4 will the issue.
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
It appears that the single GPU tests are always failing at this
test ("The operation was canceled"), probably because it is
hanging (after more than 5h). Let's try to debug by skipping this test.

Morevoer, remove a superfluous step in the CI workflow.
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
After unblocking single GPU tests with huggingface#2380, a couple of tests related
to hotswapping failed. This PR should (hopefully) address this.

1. Wrong error type caught with xfail

I set the wrong error type for the xfailing compiled hotswap diffusers
tests. This was because I hadn't checked out diffusers main when I was
checking locally.

2. Loosen tolerance

Some tests fail because an allclose does not match even though the
numbers in the logs look pretty much identical:

https://github.com/huggingface/peft/actions/runs/13404117333/job/37440752790#step:6:1929

This is most likely a problem with tolerances being too strict.
Unfortunately, I can't reproduce the error locally, so I have to guess
that moving from 1e-5 to 1e-4 will the issue.
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
* Refactor main function in dpo.py

* Update setup.py and add cli.py

* Add examples to package data

* style

* Refactor setup.py file

* Add new file t.py

* Move dpo to package

* Update MANIFEST.in and setup.py, refactor trl/cli.py

* Add __init__.py to trl/scripts directory

* Add license header to __init__.py

* File moved instruction

* Add Apache License and update file path

* Move dpo.py to new location

* Refactor CLI and DPO script

* Refactor import structure in scripts package

* env

* rm config from chat arg

* rm old cli

* chat init

* test cli [skip ci]

* Add `datast_config_name` to `ScriptArguments` (huggingface#2440)

* add missing arg

* Add test cases for 'trl sft' and 'trl dpo' commands

* Add sft.py script and update cli.py to include sft command

* Move sft script

* chat

* style [ci skip]

* kto

* rm example config

* first step on doc

* see huggingface#2442

* see huggingface#2443

* fix chat windows

* ©️ Copyrights update (huggingface#2454)

* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]

* 💬 Fix chat for windows (huggingface#2443)

* fix chat for windows

* add some tests back

* Revert "add some tests back"

This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.

* 🆔 Add `datast_config` to `ScriptArguments` (huggingface#2440)

* datast_config_name

* Update trl/utils.py [ci skip]

* sort import

* typo [ci skip]

* Trigger CI

* Rename `dataset_config_name` to `dataset_config`

* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (huggingface#2417)

* Remove unused deepspeed code

* add model prep back

* add deepspeed even if it doesn't work

* rm old code

* Fix config name

* Remove `make dev` in favor of `pip install -e .[dev]`

* Update script paths and remove old symlink related things

* Fix chat script path [ci skip]

* style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants