-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a CRM script to handle import name mis matches #218
base: main
Are you sure you want to change the base?
Develop a CRM script to handle import name mis matches #218
Conversation
…r into 181-develop-a-crm-script-to-scan-feedstocks-for-dependencyimport-name-mis-matches
…ept in favor of caching the import name data supplied by conda-forge
I'm unclear on the licensing implications of this change. I wouldn't consider the data derived from Conda Forge here as "source code", but I'm not sure what the legal interpretation of "source code" is over my engineering interpretation. That being said, I'm guessing we want to give some kind of attribution. I imagine this goes one of two routes:
I'm guessing we could end in
I imagine this file is very out of date and there have been many contributors post |
You should read through conda/grayskull#564 before you do much more on this PR. We are trying to cleanup the mapping situation and there are multiple existing solutions already. It'd be a real shame to introduce yet another variant into the mix instead of developing a single solution. cc @maresb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more direct comment related to this PR.
You should depend directly on conda-forge-metadata
to get these mappings.
See the APIs we supply here: https://github.com/conda-forge/conda-forge-metadata/blob/main/conda_forge_metadata/autotick_bot/pypi_to_conda.py#L33
conda-forge-metadata
is available on pypi and OFC conda-forge.
If there is another API you want/need, we should add it in conda-forge-metadata.
Hold on, before we continue, let us be 100% clear on what data I'm pulling, because I (and have seen others) have already gotten tripped up on this nuance. I'm interested in They are not the same. In other words, I need to know that the As far as I know, this section in If there is another way, let's talk. But I've already spent a few days trying to hunt down what data source to use. As far as I know, none of this is documented or explained anywhere. It is incredibly frustrating when everyone you talk to has a partial picture of the problem. |
Ah cool. That's all fine. In any case, the url you are using is not a supported api and could break. We should add an api to conda-forge-metadata to pull the file you need if it is not there. |
Yep, this is a big hole. We've not been able to close it yet and that is definitely largely my fault. The hope is that as we add more to conda-forge-metadata, we will add documentation and this will be the place to look. Thank you for your patience! |
Even if it is in an API, I'd still like to locally cache the results. Maybe it comes from my personal bias working in the IoT and firmware space, but network calls have costs. I'd rather not incur API hits on data that appears to change pretty infrequently. So even if this ends up on I'm not sure how long/if I have the time to add this to At the very least, I'm going to make a note at the top of the script indicating that the |
Ok, so it is at least partially implemented: def map_import_to_package(import_name: str) -> str: But obviously I'd have to query every single string. The table this PR generates for the cache is only ~300kb, but I'd imagine we'd want some pagination scheme to future-proof, if we allow for a fully query of the mapping. |
I collected my thoughts together and opened this against |
We can add an API to the file |
…r into 181-develop-a-crm-script-to-scan-feedstocks-for-dependencyimport-name-mis-matches
…r into 181-develop-a-crm-script-to-scan-feedstocks-for-dependencyimport-name-mis-matches
I concur what @beckermr said above, it would be easiest to follow what conda-forge/conda-forge-metadata#55 is proposing, not just for legal but also for API stability reasons. Once it's exposed, spelling out licensing information for the data becomes moot, not needed anymore, since the license of conda-forge-metadata is encoded in the package metadata. And importing the data via Python is a legally safe thing to do. Of course, if you add any file from any permissively licensed repo, it's easiest to copy the full |
Of course the usual thing applies, I'm not a lawyer or licensing expert, and I'd encourage to reach out to Anaconda legal or NumFOCUS legal to get confirmation. |
Resolves #181
The original ticket concept has changed a few times as others have pointed me to the
cf-graph-countyfair
data project.We now cache a modified version of Conda Forge's mapping that attempts to map irregular cases where the "import name" does not match the package name.
In theory, this will make our Python dependency calculation efforts more accurate.