Description
Checklist
- I added a descriptive title
- I searched open reports and couldn't find a duplicate
What happened?
We're running into a troubling issue on the pytorch feedstock, which is holding up the rollout of / migration for pytorch 2.7. After merging conda-forge/pytorch-cpu-feedstock#383, the windows build failed to upload libtorch
9(!) times in a row due to
http.client.RemoteDisconnected: Remote end closed connection without response
after uploading 100% of the 474MB artefact each time.
Some comments from that PR
@danpetry: [...] I've raised an incident report to the anaconda.org team internally
Is it because the packages already exist? What happens if you delete existing packages?
2025-05-03T21:27:13.0788175Z Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already exists on cf-staging. Waiting another 15 seconds to try uploading again. 2025-05-03T21:27:28.8755178Z Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already exists on cf-staging. Waiting another 30 seconds to try uploading again. 2025-05-03T21:27:59.5863085Z Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already exists on cf-staging. Waiting another 45 seconds to try uploading again. 2025-05-03T21:28:45.1693795Z Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already exists on cf-staging. Waiting another 60 seconds to try uploading again. 2025-05-03T21:29:45.8196131Z Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already exists on cf-staging. Waiting another 75 seconds to try uploading again. 2025-05-03T21:29:45.8197501Z WARNING: Distribution C:\bld\win-64\pytorch-2.7.0-cuda126_mkl_py39_h7d88c2c_300.conda already existed in cf-staging for a while. Deleting and re-uploading.
@jezdez: This smells like a race condition during the upload or some other state issue, I'd also delete and try again uploading those Windows builds [...]
We (or at least I) don't have access to cf-staging
to delete stuff, but this has never been an issue. More importantly, the time passed between 0.0% (of the upload of libtorch
) and the error seem to be consistently just a few seconds above 5 minutes:
The error message makes me think it's perhaps somehow taking too long, and running into a timeout on the connection (on the server side). If you look at the error logs [below], the error is just a few seconds more than 5min after the start of the upload every time.
[moved logs to comment below due to length restriction]
Additional Context
No response
Metadata
Metadata
Assignees
Type
Projects
Status