-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing library alias on Mac #134
Missing library alias on Mac #134
Comments
It's in |
I'm trying to track down a build failure. Perhaps you have insight into what the real problem is? My library uses PyTorch and links to libtorch.dylib. It does not directly use OpenBLAS, but apparently PyTorch does at some level, and I get an error trying to configure with CMake, telling me that libopenblas.dylib doesn't exist. Installing the PyTorch package installs the libopenblas package but not the openblas package. So what is the real problem?
One way or another, the current behavior is that if you install the PyTorch conda package and then use CMake to build something that links to libtorch, you get an error. What is the correct place to fix this? |
What's the error exactly? |
|
Does anyone have insight on what the correct fix is? I don't know whether the change should be in PyTorch, OpenBLAS, or CMake, but something needs to be changed somewhere. |
See #134 (comment) |
See #134 (comment) above. If a user simply installs PyTorch and tries to compile against it, they get a compilation error. The package includes libtorch, which contains the C++ API and is designed to be a library users link their own code to. Using the package in the intended way leads to errors. The fact that it uses OpenBLAS is an internal implementation detail of PyTorch, exactly the sort of thing conda is supposed to take care of for you. Is there any good reason not to include the symlink in libopenblas? Omitting it creates problems for users. The version of the package on the defaults channel includes it. |
Can anyone else comment on this? The problem is still not resolved. I believe the solution I originally proposed (including the alias in |
Yes, to prevent downstream packages from accidentally linking to them. The symlink and headers are in the development package just like debian, fedora and countless other package managers. Are you suggesting that debian should package the symlink and the headers in the non dev package? |
For eg: see the difference between |
Again, I'm happy to consider other options. But you still haven't suggested any solution at all to the problem. Let me repeat one more time exactly what the steps are.
Notice that the user has not installed OpenBLAS. Until they hit this error message, they didn't even know that PyTorch used OpenBLAS. They certainly don't know there are two different packages related to OpenBLAS. Further, this error only happens with recent versions of PyTorch. With older versions the above steps work perfectly, because they require an older version of OpenBLAS than is available on CondaForge. Instead conda installs a package from the defaults channel which does include the alias.
What does this have to do with Debian? This is an error on Mac. |
Ask pytorch-cpu-feedstock maintainers to patch the cmake file. It should not be necessary to link to libopenblas.dylib.
It's not unique to Mac. This doesn't manifest in Linux x86, because pytorch uses MKL in Linux x86. Anyways, this is a pytorch-cpu-feedstock issue and not this feedstock. |
This has nothing to do with a PyTorch CMake file. PyTorch simply calls |
I think the issue is that pytorch has a host-dependency on I think this setup is actually correct from the openblas side, but it sounds like the pytorch feedstock should add a runtime-dependency on CC @conda-forge/pytorch-cpu |
Can any of the PyTorch package maintainers comment on this? |
I think @h-vetinari is correct. PRs welcome. |
No, @h-vetinari is not correct. The correct way is to patch pytorch's cmake files to remove the dependence on |
Or make pytorch link to the blas library as |
Why do you keep saying that? PyTorch's CMake files do not refer to libopenblas. All it does is call |
Yes, but it links to BLAS as a public dependency instead of a private dependency. See https://github.com/pytorch/pytorch/blob/e5bf87963d65d6273a6ec3505a50c07229643606/cmake/Dependencies.cmake#L194. I suggest you start reading https://cmake.org/cmake/help/latest/command/target_link_libraries.html Please read up on how CMake target files work and have a look at |
This is clearly wrong. Have a look at |
@hmaarrfk what do you think is the right solution? I'm happy to make a PR. I just need to know what it should change. |
I think isuruf is right (he has very good instincts on this) . I think you have to hunt down a cmake file that has the word PUBLIC in it where it shouldn't. Not very easy to find. |
When it comes to this kind of hunting, |
Don't we have the package you are looking for https://github.com/conda-forge/openmm-feedstock Can you follow that recipe? |
Peter maintains that feedstock IIUC. The issue is PyTorch (upstream) has something wonky in its CMake files AIUI |
I've been looking into implementing this. The more I investigate it, the less comfortable I am with the proposed solution. Here is the relevant PyTorch CMake script. It builds two lists called I could create a patch to the script so that
Instead we could trivially and robustly fix the problem by just adding one alias to the OpenBLAS package which is already present in other distributions of OpenBLAS, including the corresponding package on the conda defaults channel. |
Download the package and see for yourself. I'm not talking about what happens while building the PyTorch package. I'm talking about what happens when you install PyTorch to use its C++ API, and try to compile your own code that links to it:
You're somehow supposed to figure out on your own that there exists a package called
Isuru has not offered a single reason why it would be harmful to add the alias, nor addressed the fact that the alias is present in other comparable packages (such as MKL, or the defaults channel version of |
Unfortunately, i really don't have an Mac M1. I am unable to download and see for myself. A log file showing your error would go a long way. Generally speaking: a one line patch to pytorch is not "hacking" and would likely get accepted upstream. |
We have also worked alot with different upstream They are, generally speaking, really interested in getting people to use their software. They just don't really test for "conda" at first. Edit: package provider -> software authors |
I just don't understand where you're coming from. Can you give one single disadvantage to including the alias? Is there any possible use case where it would cause problems? If not, then why is there any disagreement? And let me emphasize again, this package is the outlier that's violating the conventions of the rest of the ecosystem. If you install the same package from the defaults channel, it includes the alias. If you install a different BLAS from conda-forge, such as MKL, it includes all necessary aliases. But this one doesn't, and that omission causes major problems for users.
Just download https://anaconda.org/conda-forge/libopenblas/0.3.20/download/osx-64/libopenblas-0.3.20-openmp_hb3cd9ec_0.tar.bz2. Expand it and see for yourself that it contains a library called Now download https://anaconda.org/anaconda/libopenblas/0.3.20/download/osx-64/libopenblas-0.3.20-h9a5756b_1.tar.bz2, which is the exact same version of the same package from the defaults channel, and see that it does include the alias.
See #134 (comment) above. |
You probably missed #134 (comment) |
I saw it. It just doesn't make any sense. No downstream package will link to it unless it's specifically trying to link to it. Linking to a library is not something that happens "accidentally". Every library you link to has to be specified by name. And let me emphasize yet again, this is how every other comparable package works. Including the same package from the default channel. Including other BLAS implementations from conda-forge. |
Let me try to recap in a constructive way. You asked for help, from volunteers, without providing any logs claiming that something was broken. Instead you pointed to how "defaults" does it. There are reasons why you are choosing conda-forge. Some of the design decisions behind how conda-forge is built is the reason why you find this attractive. Now, you had 3 core members of conda-forge respond to you in a consistent manner. In fact, two of those are actively involved in building large packages like pytorch. In #134 (comment) and #134 (comment) isuru pointed you exactly to the line of code that you would have to suggest changing upstream, and for conda-forge that would address your issues. I, a maintainer of the pytorch feedstock, would personally accept such a patch, and help you justify it upstream. We aren't saying we are infallible, but in this case, you got some pretty direct feedback about how to solve this particular issue. Please try to make an attempt to resolve it. |
@hmaarrfk I've been trying very hard to be constructive through this whole discussion, but I'm also finding it incredibly frustrating. It began with Isuru basically dismissing my suggestions with a vague reference to Debian, an argument that made no sense and displayed a lack of understanding of the deep differences between the dependency models in Linux and Python. I nonetheless made a sincere effort to implement the solution he suggested, but quickly concluded it just didn't make sense. I wrote up a technical discussion of the problems with it, which was met with complete silence. It took over three weeks and multiple further posts to get anyone to reply, and when they did, I was basically told, "Isuru doesn't like it so it isn't going to happen." Meanwhile, Isuru has not replied to a single thing I've posted for two months, which makes a productive discussion entirely impossible. I am 100% willing to do the work to fix the problem. But no one seems interested in what is obviously the correct fix, instead referring back to the earlier suggestion which has serious problems that I already explained in detail. Every time I try to discuss the technical reasons for preferring it, I am met with silence. Looking back over the discussion, I count no less than four times when that happened.
I did post it, over two months ago. And I provided another link back to it just yesterday. I am trying really, really hard to fix a serious problem that is impacting my work. I am willing to do the work to fix it. If you say you're willing to accept a PR adding the missing alias, I'll do it immediately and we can close this issue as resolved. But instead I feel I'm just getting ignored, and it's very upsetting. |
I'm not really sure if your assessment of the timeline is correct. isuruf replied in a very timely manner at the beginning of the issue. isuruf likely has the most complete understanding of compilers, conda, and conda-forge of those i've interacted with.
The internet can be dry at times. The balance between "brief" answers and "basically dismissing" is thin. However, he promptly replied to questions.
Let me justify why I didn't reply. When asked for additional information, you provided 3 lines of a log. Often it is really important how you setup your information. When asked for the "exact" problem, often it is really valuable to provide full CMake Logs. We could have been more explicit about this request, but the level of experience compiling things yourself often indicates that you are comfortable communicating programming with fully reproducible logs.
Again. many of us trust isuru's judgement on these things.
Again, nobody has to reply. if somebody feels like they cannot add anything constructive to the discussion, they often refrain from typing.
I don't think any solution obvious. We often work collaboratively asking others for input and feedback.
These logs are seriously incomplete. For an example on how to ask for external help, please see google/highway#726 where I included as an attachment, log information going all the way back to early cmake commands and the installation of dependencies. These are hard to make without CIs, but you can likely start at when you type
and provide us all that information.
As much as we want to help, we often won't be fast enough for a mission critical work. Its really up to you, as an engineer, to decide on the risk factors you want to choose and how you want to build your software dependencies. The infrastructure at conda-forge is flexible enough to allow you to build and upload your own packages. This REALLY helps decouple problems.
I don't think I can accept a PR to openblas-feedstock since those involved in maintaining it expressed that they do not agree with the assessment that any files are missing from the released packages.
You've been pointed in the right direction in #134 (comment) For what its worth, isuru's comments lead me directly to a small patch: which I've asked for feedback from upstream developers too In summary:
As a maintainer of the pytorch feedstack, and having studied the suggestions made by isuru, i agree that it is a bug in pytorch, and thus, in the pytorch feedstock. I'm working with other maintainers to address it. And thus I'm going to close this issue. It may take a few weeks to finish the rebuild given how difficult it is to build pytorch on CIs. |
What I provided is exactly what I was asked for. Isuru's exact request was, "What's the error exactly?" I provided the exact error. At no point in the entire discussion did anyone ever say, "Your error message was insufficient, please post more details."
I posted the above on May 3. More than two weeks later, on May 18, still no one had replied. I therefore posted again asking for any insight--keeping in mind I had provided all the information I had been asked for up to that point. He replied with a two word message, "See #134", linking back to an earlier one sentence message from before he asked me for the error and before I provided it.
Tell me precisely what information you want, and I'll be glad to provide it.
If he were participating in this discussion that might be a valid point. However, he has not said a single thing in two months, since before I posted a detailed analysis of why his suggested solution was not correct. That was nearly a month ago, and still not a single person has responded to a single one of the technical issues I raised. Not one. There is no discussion going on here. I keep trying to discuss technical issues, and all my attempts are completely ignored. If Isuru wants to join in the technical discussion, he's welcome to. If he doesn't want to that's his choice too. But his choice to ignore the discussion does not mean anything has been resolved.
And I replied with #134 (comment) explaining why that was not, in fact, the right direction. Do you agree with the technical points I raised? Disagree? If you disagree, please say so and explain why. If you agree, please say that.
I strongly recommend against that patch. It is not the right solution. It is based on a misunderstanding of the problem. I've already explained the reasons why. I could keep repeating them, or expand on them further if you want to understand the actual issues here. What you're doing there is a band-aid to paper over the fact that the OpenBLAS package is broken and violates the conventions of the Python ecosystem. In the process, it changes the binary interface for libtorch which risks breaking downstream software. |
Could you explain this binary interface part to me? (As in, explaining how it would change it and thus risk breaking software using it) -- And, if you want an even more trivial solution, you could follow the very first response in the issue:
|
That's the problem. it's not the solution!
Consider a downstream C++ code that links to libtorch, and also invokes BLAS routines directly. PyTorch supports many different BLAS implementations, and ensures that an appropriate one will be installed. Its build scripts contain lots of logic to locate different implementations and link to them correctly. You could repeat all that work in your own build system, but why do that when PyTorch has already done it for you? When you link to libtorch, that's guaranteed to bring in an appropriate BLAS that you can invoke with no further effort. The proposed change would break that. It changes BLAS from public to private. Any package that expects it to be public would no longer compile. |
But we purposely aren't relying on PyTorch's logic in the feedstock, or at least, we are forcing it in a certain direction --- essentially, all MKL except OpenBLAS for osx-arm64. If I am not mistaken, we do want to set up a matrix at some point to let the user select whichever they want, but there is no point for now given the superiority of MKL and the preference of upstream for it. Anyway, even though I understand your concerns, I am not really sure I see it as a serious problem, in part because we have introduced a global pytorch pin, so packages will have to be rebuilt with each version...
Yes, that's fine, no? Then downstream packages will fix their builds and adjust according to the proposed change. |
I also want to add one more thing as someone who's only recently joined this community. I think it is better to keep a friendly, less adversarial, tone. I understand, and sympathize with, your frustration, but just remember everyone here is a volunteer (as far as I understand). There is a higher likelihood of change (albeit incremental and slow) if we support the collective decision-making. Not every single solution will fit neatly and perfectly with what you (or I) think is best, but as long as it is acceptable, we should all get behind it and move the ecosystem forward. We could always revisit and correct course later --- no need to stop everything here for this very single alias. |
That's a different problem than what I'm talking about. I'm not talking about adding a new feature. I'm referring to existing packages that are out there right now, that link against libtorch. Those packages expect it to publicly link BLAS, because that's what the official PyTorch distribution does. Many of those packages may not even use conda as their primary distribution mechanism. Pip is more popular, and even among conda users, the
Why would you break their packages? What is the benefit of doing that? It will disrupt users' workflows and create extra work for conda-forge volunteers. I certainly would not call that fine!
Believe me, any adversarial tone is not intended on my part! On the contrary, I have been consistently trying to keep the discussion entirely focused on technical issues. I'm happy to participate in any collaborative process so long as that process is actually collaborative (i.e. not simply ignoring all attempts to raise technical issues). |
I believe you. I myself unintentionally sometimes cause a scene trying to help... but I end up being counterproductive... Not saying that you are. Anyway, how about we try to move forward? Let's not consider your specific proposal here for now, and let's see how we can fix things in the pytorch-feedstock so that your project can more smoothly and productively move forward. If we end up realizing that there is no way around this particular fix, we will come back to it in a new issue. Good starting point is the PR you commented on, but perhaps you should start a new issue there to lay out your case again (if you'd like). For now, I think the pytorch-feedstock issues/PRs are the better place to discuss issues surrounding this. |
Let me summarize what I see as the choice we're making here. Two possibilities have been proposed.
Do you agree with that analysis? Are there any important issues you feel it misses? |
I think your analysis is fine, but it should be "libopenblas" not openblas; the latter does have the missing alias you are after. The bigger issue here is thus: Either way, we are trying to accommodate a specific fix for you. The maintainers of openblas think it is best this fix comes from pytorch. A maintainer from pytorch seems to agree that a fix in the pytorch feedstock is doable and the better route. So why not join the effort fixing this in pytorch and see how it goes? If you're not satisfied with the solution afterwards, we can discuss again. Edit: and this should go under the dangers of the second solution: #134 (comment) |
It looks like we both posted at the same moment. All I can do is give the best advice I can based on the my understanding of the problem. It's not my decision to make. I just wish someone could articulate a clear technical argument for why that decision is being made.
In that case, the PyTorch package should list
On the contrary, it's the exact opposite. I can work around the problem. It's a bit annoying, but manageable. But I'd much rather fix the root problem so other users don't have to deal with it in the future. That's my goal here. I'm not demanding a fix just for me. I'm trying to improve the health of the ecosystem and eliminate future problems for other people. |
We then agree on wanting a healthier ecosystem :) The best technical argument is the issue with packages accidentally depending on it (though you disagreed with that earlier). Unfortunately, I am not the right person to assess this or weigh on it. But, let's take this opportunity to have you be more involved in the pytorch feedstock going forward --- how about that? 😃 |
You complain about the amount of time with no responses, let me add another ingredient in the mix that I suspect is affecting this discussion: you are arguing quite forcefully that you're right and everyone else is wrong. That means there are strongly diminishing returns to spend energy engaging with you, and consciously or subconsciously, people will end up engaging less. Despite that, certain people have made an effort (thanks a lot @hmaarrfk!), but your messages show no reflection on what is being said, e.g.
We agree with the patch and disagree with you, and we've also already explained the reasons why. But to give you one more explanation for this: building an ecosystem worth of packages leads to different kind of concerns and trade-offs. Adding this alias has the potential to lead to many packages silently and wrongly linking to openblas, rather than the very carefully built blas setup in conda-forge that allows switching out blas implementations in the environment.
I think you're exaggerating the impact here. Pytorch should not re-expose blas symbols, as they don't own those in any meaningful way. If packages were relying on that, they should specifically link against a blas implementation. And the patch is neither crazy intrusive nor unmaintainable, and will hopefully even be accepted upstream. |
Could you please quote a single place in this thread where I said everyone else is wrong? I realize that may itself sound aggressive, but I feel I am being unjustly attacked, and your statement was itself extremely aggressive. I have literally being struggling for the past three months just to get anyone to reply. My complaint has never been that everyone else was wrong. At any time I would have been happy to hear a reply, even if it was one I disagreed with! If you read every post from when I opened the issue on March 3 all the way to May 31, you will see the only person I ever disagreed with was Isuru, and that's because he's the only person who ever posted anything concrete it was possible to agree or disagree with! This discussion has gone way off the rails, and perhaps it would be best to just let it die and forget the whole thing. I believe, however, that there's a much deeper and much more serious issue: this community is neither welcoming nor respectful toward outsiders. I have witnessed it repeatedly, and heard the same complaint from many other people. Please go back to the start of this thread and just read from the beginning with an open mind! What you will see is a lot of me trying to be collaborative and open to suggestions. You will also see that I repeatedly tried to discuss technical issues and asked for guidance on the right fix, that these attempts were ignored and dismissed, and that at times they were met with personal attacks like the one quoted above. I think it would be really useful for all of us to read through the CondaForge code of conduct. To be very blunt, this community does not live up to it. I can easily point you to other threads where you will see lots more of the same sort of behavior, but I don't think that would really be productive. I'm not interesting in who to blame for anything that's past. But can we all commit ourselves to turning this into the kind of open, welcoming, collaborative community that I think we all want it to be? |
I provided an example right in my post. After all the preceding discussion,
implies we are all misunderstanding the problem, but you understand it correctly. Similar statements are further upthread (e.g. "I saw it. It just doesn't make any sense.").
I apologise you feel attacked. My point was to try to explain to you my (subjective) point of view is why you're not getting more replies. Intentionally or not, implying repeatedly that others do not understand what they are explaining to you is just not a very... effective way of convincing anybody.
You're painting with a very broad brush here. Everyone of the regular contributors that I know in conda-forge is trying hard to accommodate requests and newcomers alike. That's not a claim to perfection (certainly not for myself), but we're all volunteers, and the more demanding the request, the more those ideals may suffer. We don't get to choose how we come across online - I'm trying to tell you (and ngam was pointing you in the same direction re:adversarial) that the way you communicate on this topic makes it unappealing to keep engaging. I realise that this is frustrating; similarly on the other side of the fence (and presumably the main reason why Isuru hasn't commented again, though I should clarify that this is only my conjecture). Again, my message was meant as clarification, not as an attack - I'm sorry if it came across like that. With that I'm going to take my leave from this thread. |
We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: conda-forge/pytorch-cpu-feedstock#116 xref: conda-forge/openblas-feedstock#134 Pull Request resolved: #78883 Approved by: https://github.com/ezyang
Summary: We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: conda-forge/pytorch-cpu-feedstock#116 xref: conda-forge/openblas-feedstock#134 Pull Request resolved: #78883 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/221755cc7151fa59514ec4c2e52b1ab991fc07a0 Reviewed By: osalpekar Differential Revision: D37059163 Pulled By: osalpekar fbshipit-source-id: e1b7bc6972b707ea712b1901efadfc76806c72f9
Solution to issue cannot be found in the documentation.
Issue
The Mac packages distributed through the default channel include an alias with the name
libopenblas.dylib
pointing to the library. That allows it to be referenced with standard naming. The conda-forge packages are missing that alias. Instead you have to reference it with the namelibopenblas.0.dylib
. This causes problems for software that expects the standard name to work. See, for example, numpy/numpy#14165 and https://gitlab.kitware.com/cmake/cmake/-/issues/23291.Would it be possible to add the missing alias?
Installed packages
Environment info
The text was updated successfully, but these errors were encountered: