Skip to content

#231 rebased on main for PyTorch 2.5.1 #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Tobias-Fischer
Copy link
Contributor

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Very hacky for now without proper commit messages etc; just for testing

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Dec 11, 2024

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12349671136. Examine the logs at this URL for more detail.

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub Actions workflow below for more details. You can also ping conda-forge/core (using the @ notation) for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

  • Is the recipe/{meta.yaml,recipe.yaml} file valid?
  • If there is a recipe/conda-build-config.yaml file in the feedstock make sure that it is compatible with the current global pinnnings.
  • Is the fork used for this PR on an organization or user GitHub account? Automated rerendering via the webservices admin bot only works for user GitHub accounts.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12267059292. Examine the logs at this URL for more detail.

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub Actions workflow below for more details. You can also ping conda-forge/core (using the @ notation) for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

  • Is the recipe/{meta.yaml,recipe.yaml} file valid?
  • If there is a recipe/conda-build-config.yaml file in the feedstock make sure that it is compatible with the current global pinnnings.
  • Is the fork used for this PR on an organization or user GitHub account? Automated rerendering via the webservices admin bot only works for user GitHub accounts.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12267131872. Examine the logs at this URL for more detail.

@Tobias-Fischer
Copy link
Contributor Author

FYI @jeongseok-meta @baszalmstra - I created this PR for playing around; I'll see if I have time to push this further. Do we have a test server that we can play with easily? Unfortunately the conda-forge Windows machine only has 20GB storage left, which won't be nearly enough for this ..

@baszalmstra
Copy link
Member

I see you already found how to get access to the windows runners! 👍

@Tobias-Fischer
Copy link
Contributor Author

Hmm I’m not sure why the CI isn’t running

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12285335923. Examine the logs at this URL for more detail.

conda-forge-webservices[bot] and others added 2 commits December 15, 2024 07:26
@baszalmstra
Copy link
Member

I ran into the same issue in my PR. The only reliable way I found to fix this was to add cmake --build build --target clean after each output build and rely on sccache for caching instead (see my PR for details)… Far from ideal..

@Tobias-Fischer
Copy link
Contributor Author

Hmm this directly builds on top of your PR and already includes the clean: https://github.com/Tobias-Fischer/pytorch-cpu-feedstock/blob/4bbb512e60f2ddb643c1a57c174df94106abf957/recipe/bld.bat#L139

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

conda-forge-webservices[bot] and others added 2 commits December 15, 2024 19:20
Co-authored-by: Mark Harfouche <[email protected]>
@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

@Tobias-Fischer
Copy link
Contributor Author

Tobias-Fischer commented Dec 15, 2024

mkl builds fail with:

2024-12-15T21:49:51.9936405Z import: 'torch'
2024-12-15T21:49:52.0741907Z Traceback (most recent call last):
2024-12-15T21:49:52.0742455Z   File "C:\bld\libtorch_1734290771319\test_tmp\run_test.py", line 2, in <module>
2024-12-15T21:49:52.0744309Z     import torch
2024-12-15T21:49:52.0744746Z   File "C:\bld\libtorch_1734290771319\_test_env\lib\site-packages\torch\__init__.py", line 262, in <module>
2024-12-15T21:49:52.0745261Z     _load_dll_libraries()
2024-12-15T21:49:52.0745855Z   File "C:\bld\libtorch_1734290771319\_test_env\lib\site-packages\torch\__init__.py", line 245, in _load_dll_libraries
2024-12-15T21:49:52.0746632Z     raise err
2024-12-15T21:49:52.0747344Z OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\bld\libtorch_1734290771319\_test_env\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.
2024-12-15T21:49:52.5360364Z WARNING: Tests failed for pytorch-2.5.1-cpu_mkl_py310h279fb04_106.conda - moving package to C:\bld\broken

non-mkl builds fail with:

2024-12-15T21:49:55.5505504Z import: 'torch'
2024-12-15T21:49:56.1187098Z WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
2024-12-15T21:49:56.1187940Z E0000 00:00:1734299396.118431    2620 descriptor_database.cc:633] File already exists in database: onnx/onnx_onnx_torch-ml.proto
2024-12-15T21:49:56.1189034Z F0000 00:00:1734299396.118613    2620 descriptor.cc:2236] Check failed: GeneratedDatabase()->Add(encoded_file_descriptor, size) 

Will try and fix the non-mkl builds by setting ONNX_USE_PROTOBUF_SHARED_LIBS=ON. Not sure about the mkl builds.

conda-forge-webservices[bot] and others added 2 commits December 15, 2024 21:58
@Tobias-Fischer
Copy link
Contributor Author

I have a feeling that the mkl issue is due to conda-forge/conda-forge.github.io#1597 but I don't understand enough of it.

In the logs we can see that

2024-12-15T21:48:30.6185793Z ClobberWarning: This transaction has incompatible packages due to a shared path.
2024-12-15T21:48:30.6186765Z   packages: conda-forge/win-64::intel-openmp-2024.2.1-h57928b3_1083, conda-forge/win-64::openmp-5.0.0-vc14_1
2024-12-15T21:48:30.6187424Z   path: 'library/bin/libiomp5md.dll'

We also know the mkl error is:

2024-12-15T21:49:52.0747344Z OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\bld\libtorch_1734290771319\_test_env\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.

And that one of the fbgemm.dll dependencies is libiomp5md.dll ((libtorch,Lib/site-packages/torch/lib/fbgemm.dll): Needed DSO Library/bin/libiomp5md.dll found in ['conda-forge/win-64::intel-openmp==2024.2.1=h57928b3_1083']).

Not sure how to go from here, @isuruf mentioned in conda-forge/conda-forge.github.io#1597 that:

Of course**. But isn't the underlying issue here that conda-forge's mkl depends on intel-openmp?

That's because of pytorch. If we have a pytorch windows build, then we can make mkl depend on llvm-openmp. pytorch upstream seems to be linking to the vcomp interface in libiomp5md.dll which is not present in llvm-openmp, but we can avoid that if we have a build on windows.

@Tobias-Fischer
Copy link
Contributor Author

It seems like USE_LITE_PROTO fixed both mkl and generic builds :).

What are the best steps to get a proper review going? We need:

  • general cleanup, removing of old comments etc
  • CUDA builds
  • Proper rebuild with commit history
  • ?

I’ll kick off some CUDA builds - let’s see how we go.

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin please rerender

conda-forge-webservices[bot] and others added 3 commits December 16, 2024 09:14
@Tobias-Fischer
Copy link
Contributor Author

Closing - all work has been rebased on top of #231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants