Skip to content

Error encountered while using huggingface-cli to download datasets #2936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
GuocunWang opened this issue Mar 18, 2025 · 4 comments
Open

Error encountered while using huggingface-cli to download datasets #2936

GuocunWang opened this issue Mar 18, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@GuocunWang
Copy link

Describe the bug

When I download the HSSD dataset, I use the following command:
huggingface-cli download --repo-type dataset --resume-download hssd/hssd-hab --local-dir . /data/hssd-hab

The following error was encountered: huggingface_hub.errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url.

This should be due to a large number of small files in the dataset resulting in too many download requests. Is there a proper way to download it please?

Reproduction

No response

Logs

System info

- huggingface_hub version: 0.29.3
- Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.17
- Python version: 3.8.20
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/xxx/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.6
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 9.3.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.23.0
- pydantic: N/A
- aiohttp: N/A
- ENDPOINT: https://hf-mirror.com
- HF_HUB_CACHE: /home/xxx/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/xxx/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/xxx/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /home/xxx/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@GuocunWang GuocunWang added the bug Something isn't working label Mar 18, 2025
@Wauplin
Copy link
Contributor

Wauplin commented Mar 18, 2025

Hi @GuocunWang sorry you've encountered this issue. Can you post a full stacktrace of this error? I'm particularly interested in getting a request ID (that is logged in the error message) to investigate on our side.

@GuocunWang
Copy link
Author

@Wauplin
I hope this message finds you well. I encountered a 429 error (Too Many Requests) while trying to download a dataset using the huggingface-cli download command. As per your recommendation, I enabled debug mode by setting the HF_DEBUG=1 environment variable to capture additional log information.

HF_DEBUG=1 huggingface-cli download --repo-type dataset --resume-download hssd/hssd-hab --local-dir ./data/hssd-hab

However, I was unable to find the request ID in the logs, which I understand is crucial for investigating the issue further.

Here is the full error stack trace I received:

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
    response.raise_for_status()
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://cdn-lfs.hf-mirror.com/repos/d3/36/d336e691db6b26b29482553af06f12dcdf6030d98dd09e339e447f224906b13e/071fbc0b8ba5c0a9c61e099ce01c17fa4e71090a5185fdfe69817ef3ec60bd81?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27114e1852d31347304206fea57f97d5be074cad04.collider.glb%3B+filename%3D%22114e1852d31347304206fea57f97d5be074cad04.collider.glb%22%3B&response-content-type=model%2Fgltf-binary&Expires=1742377148&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MjM3NzE0OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy9kMy8zNi9kMzM2ZTY5MWRiNmIyNmIyOTQ4MjU1M2FmMDZmMTJkY2RmNjAzMGQ5OGRkMDllMzM5ZTQ0N2YyMjQ5MDZiMTNlLzA3MWZiYzBiOGJhNWMwYTljNjFlMDk5Y2UwMWMxN2ZhNGU3MTA5MGE1MTg1ZmRmZTY5ODE3ZWYzZWM2MGJkODE~cmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=MpCxzibDlKPTlt7xJwtYDc-X5pfpG0lEpEGFxmK95ZHc8tMqtFSUxT~GNHfMfzR6j6hEbDvkawZkuB3E29kWMoqalDN7t49MDHhsc-s2n9pMd-0wrtcF3DVDD7sd-d5W1PgsaFk7vkkUtjUfHzsgKpBxzl6FJR9BeYx-AoIsJBvdatfcA1-5fE4U-gA0qwhnucgzTF8vswgUB8yLZLpjYkcbwtJvezsmAtmNgVpBSal4oJNbuKPfpwUrk34ZfvYxpKfsZM-8TMoaZj4~H1FIhzb6YVjpEzME~~Zm-kbOHeDuEpptfbDmTlZR2LjIrK7umuYnGGCB37MqLXp2W5vr0w__&Key-Pair-Id=K3RPWS32NSSJCE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/UH-1-rl/bin/huggingface-cli", line 8, in <module>
    sys.exit(main())
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/commands/huggingface_cli.py", line 57, in main
    service.run()
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/commands/download.py", line 153, in run
    print(self._download())  # Print path to downloaded files
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/commands/download.py", line 187, in _download
    return snapshot_download(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 296, in snapshot_download
    thread_map(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 270, in _inner_hf_hub_download
    return hf_hub_download(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 842, in hf_hub_download
    return _hf_hub_download_to_local_dir(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1138, in _hf_hub_download_to_local_dir
    _download_to_tmp_and_move(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1547, in _download_to_tmp_and_move
    http_get(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 371, in http_get
    r = _request_wrapper(
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 304, in _request_wrapper
    hf_raise_for_status(response)
  File "/home/xxx/anaconda3/envs/UH-1-rl/lib/python3.8/site-packages/huggingface_hub/utils/_http.py", line 481, in hf_raise_for_status
    raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://cdn-lfs.hf-mirror.com/repos/d3/36/d336e691db6b26b29482553af06f12dcdf6030d98dd09e339e447f224906b13e/071fbc0b8ba5c0a9c61e099ce01c17fa4e71090a5185fdfe69817ef3ec60bd81?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27114e1852d31347304206fea57f97d5be074cad04.collider.glb%3B+filename%3D%22114e1852d31347304206fea57f97d5be074cad04.collider.glb%22%3B&response-content-type=model%2Fgltf-binary&Expires=1742377148&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MjM3NzE0OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy9kMy8zNi9kMzM2ZTY5MWRiNmIyNmIyOTQ4MjU1M2FmMDZmMTJkY2RmNjAzMGQ5OGRkMDllMzM5ZTQ0N2YyMjQ5MDZiMTNlLzA3MWZiYzBiOGJhNWMwYTljNjFlMDk5Y2UwMWMxN2ZhNGU3MTA5MGE1MTg1ZmRmZTY5ODE3ZWYzZWM2MGJkODE~cmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=MpCxzibDlKPTlt7xJwtYDc-X5pfpG0lEpEGFxmK95ZHc8tMqtFSUxT~GNHfMfzR6j6hEbDvkawZkuB3E29kWMoqalDN7t49MDHhsc-s2n9pMd-0wrtcF3DVDD7sd-d5W1PgsaFk7vkkUtjUfHzsgKpBxzl6FJR9BeYx-AoIsJBvdatfcA1-5fE4U-gA0qwhnucgzTF8vswgUB8yLZLpjYkcbwtJvezsmAtmNgVpBSal4oJNbuKPfpwUrk34ZfvYxpKfsZM-8TMoaZj4~H1FIhzb6YVjpEzME~~Zm-kbOHeDuEpptfbDmTlZR2LjIrK7umuYnGGCB37MqLXp2W5vr0w__&Key-Pair-Id=K3RPWS32NSSJCE

Unfortunately, no request ID is provided in the error message. Could you please assist me in resolving this issue or advise if there is any other information I can provide?

Thank you for your support, and I look forward to your guidance on how to proceed.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 19, 2025

@GuocunWang thanks for providing this! Very helpful! It turns out that you are using hf-mirror.com mirror site maintained by @padeoe. Unfortunately there is nothing much we can do on our side as we don't manage this CDN.

@padeoe any idea what can be done?

@padeoe
Copy link

padeoe commented Mar 19, 2025

Hi @GuocunWang @Wauplin ,It appears that the 429 Client Error: Too Many Requests is likely due to the large number of files being downloaded simultaneously from the repository, which might be hitting rate limits set by the hf-mirror.com.

To address this issue, I have added an exception in our system for the specific path you mentioned (https://cdn-lfs.hf-mirror.com/repos/d3/36/d336e691db6b26b29482553af06f12dcdf6030d98dd09e339e447f224906b13e/*). This should help mitigate the rate limiting issue and allow you to download the files more smoothly.

If you continue to experience issues, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants