-
Notifications
You must be signed in to change notification settings - Fork 606
fix: grab PyTorch ABI version from libtorch_python.so #4345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: raayandhar <[email protected]>
Signed-off-by: raayandhar <[email protected]>
Managed to repro locally and do some digging into what's happening - this manages to build now locally, hopefully it will work with CI as well. |
So I'm not sure if it is sufficient to grep for the CXX abi version in the shared object file. E.g., if I install the nightly build we have pinned at 8/20, the
We might need to figure out a different approach. |
However, I do find |
If I remember correctly, I had tried looking yesterday and did not find any |
Yeah, honestly, I have no idea what the right fix is. I've tried modifying a few things, e.g., updating pybind to 3.0.1 to match pytorch and removing explicit CXX_ABI flags. No matter what, if it compiles, then it fails tests due to an opaque error about types (likely meaning something went wrong with the abi compatibility). I'm really not sure if it is tenable to fix this in a short amount of time, and considering this is blocking all work on this repo, I'm going to work on pulling out the e2e testing from |
Signed-off-by: raayandhar <[email protected]>
The recent error in CI, with these failures:
I can't reproduce - I wasn't able to reproduce on stable before, then tried updating the nightly commit to something newer, and I pass all of these tests as well (when they were previously failing, just the last CI had these same errors on nightly side)... |
Really not understanding how the error in CI is being caused in the Python regression tests. It's obviously related to the compiled C++ bindings, but cannot reproduce locally on stable or nightly (with the nightly version I'm using here). Might be related to cache stuff, no idea. |
In theory this should work to fix #4343. But I can't reproduce locally yet so not sure. Working on repro-ing locally first.