Skip to content

Retry on transient error in download workflow #2976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Apr 3, 2025

Originally from @Wauplin on Slack (internal link):

I'm guessing huggingface_hub doesn't have retry for 429. Should be a good thing to add?

Well, I have no idea why we haven't spotted that before 🫠
http_get relies on _request_wrapper which itself has been refactored in Oct. 2023 (!!) by ... hmm... myself. I wrote in comments Perform request and return if status_code is not in the retry list. . But the implementation is a simple response = get_session().request(method=method, url=url, **params) without any retry mechanism 🙈 See #1766.

So basically this PR adds a retry mechanism when downloading a file (GET /resolve) or when fetching metadata for a file (HEAD /resolve) if rate limit is reached. Retry after 1s, 2s, 4s, 8s and 8s.


Comments:

  • I'm not 100% sure about that PR. Opening it for discussion but I feel that I'm missing something - I was pretty sure until now that we had already a retry mechanism in place for 429...
  • After some checks, I remembered that HfFileSystem uses http_backoff with retries on 5xx errors but not 429 🤔

@julien-c @LysandreJik @hanouticelina any obvious drawback that I missed from retrying on 429 for now?

EDIT: actually HTTP 429 are rare in downloads, that's why we never had one. Problem arose today because of Xet bridge that rate-limited @Vaibhavs10

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hanouticelina
Copy link
Contributor

I was also pretty sure we had the retry-mechanism on 429 errors but I guess i was dreaming too 😄 I don't see any drawback for now, I think it's safe to add a retry mechanism on only 429 errors 👍

@Wauplin Wauplin requested a review from hanouticelina April 16, 2025 17:00
@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 16, 2025

After some thoughts, let's do it :) Retrying on 429 only is a safe bet. Worth case, we adapt later. @hanouticelina mind reviewing? (and merging if I'm off 🤗)

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thank you!
let's merge it!

@hanouticelina hanouticelina merged commit ddec2ab into main Apr 17, 2025
24 checks passed
@hanouticelina hanouticelina deleted the retry-on-rate-limit-download branch April 17, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants