-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Feat/build deno package second try #419255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In general we want to do as little processing as possible in FOD derivations, so that we have little reason to change them. The fundamental thing only a FOD can do is download resources from the internet. It’s possible to adapt things to whatever layout a new version of Deno is expecting in the non‐FOD build, but it’s not possible to change existing FODs. It’s of course acceptable to change the format for new lock file versions, since there’s never any existing FODs using those. So, for instance, My suggestion would be to do the bare minimum of computing the downloads that need to be done and placing them in the output directory without any further processing unless absolutely necessary. I’m not sure exactly what the format looks like for JSR packages, but for NPM maybe we could adapt the existing fetcher codebase so that there’s a lower layer that can take the relevant information for a single package directly rather than requiring it to be in As long as the downloads are unprocessed and placed in a naming scheme that we don’t expect to need to change (e.g. one consideration that has been required with Cargo is ensuring that we can support multiple versions of the same package name in a dependency tree), it should be pretty safe. |
Deno maintainer dsherret pointed me to some rust crates, which the deno cli uses, like https://github.com/denoland/deno_cache_dir and https://crates.io/crates/deno_npm_cache Here comes the idea: What do you think about that? Am I missing something? |
If we don’t want to break FODs, we would have to keep the fixed Deno version and all its dependencies building forever, likely multiple versions as we add more to support newer lock file formats, and run into the fact that the versions will inevitably become EOL and develop security vulnerabilities (that may even be relevant to FOD fetching – e.g. TLS issues). Breaking FODs loudly is an option, but a pretty frustrating one for downstream users depending on how often it happens. I am not sure that |
And that is not the case for a custom fetcher, because we can bump the dependencies of the custom fetcher (like the compiler) without changing the hashes of the FODs it produces? Anyway, the perk of breaking hashes way less often, by decoupling the fetcher and transformer, convinced me of the solution. Thanks you for your patience, I needed a little bit of time to fully understand the implications. I'm gonna write the custom fetcher with go, since I'm not really acquainted with rust yet. I also looked into a way to have an equivalent to Edit: should be possible |
Is there a good reason to have the option to build the dependencies with a hash, or would it be sufficient to just have a fetcher that imports all hashes from the lockfile? And another question, in the case of deno and jsr, it seems that individual files are downloaded and hashed, instead of whole tar balls. So if I implement an |
So my plan currently is to just build an And it works like this: NOTE: since there are 3 kinds of dependencies with deno (npm, jsr and direct url), there are usually 3 cases to consider
With preprocessing and postprocessing, the FOD's won't change often, if ever, since they are decoupled from any breaking changes that deno could introduce. Still it's possible that something in step 2 will have to be changed, so I am unsure about wether 2.3. is a good idea, since it introduces a dependency on the You said
But I'm not sold on that yet either. There are multiple individual rust crates that contain the logic for the things we need to do For deno.lock: https://crates.io/crates/deno_lockfile and some more. From what I can tell they are not really doing what we need. On paper it seems like a good idea to use them, but somehow they don't really fit into the architecture I lined out now. Ideas? Objections? |
Update: implementing the jsr side of things like this went fine. the url dependencies proved to be more convoluted than expected, since they require custom transformation logic per domain and I haven't fully understood the details yet. |
Update: I've hit a snare with the importLock implementation. Deno requires to know about some response headers of fetched files. While you can use the curl flag This seems to be inevitable. It's imaginable to do this with a hack, by hard coding those headers in certain situations, since they are only required in rare cases, but there will probably be problems with that. So I think it's not possible to do the importLock cleanly, which means I'm gonna stick to the approach which requires an explicit hash in the nix derivation. |
@emilazy while talking about internals with deno maintainers on their discord, a flatpak contributor told me, they also did a deno build helper for flatpak recently. I looked at the code and it got me thinking. At flatpak they have a standardized interface for the whole custom fetcher problem. I thought it would be cool, if nixpkgs could also have a standardized interface. Surely it won't work in every case, but unifying most of the language build helpers could improve maintainability and at the same time encourage to do it the right way, the way you told me to do it now, to decouple the FODs from any upstream changes. Single FOD custom fetcherA non-importLock custom fetcher could look something like this:
{
hash="";
packages= {
"<unique package or file name>" = {
url="<url>";
path=null; # in step 2., the url is transformed into a unique path within the derivation, maybe by creating a hash of the url
curlOpts=""; # per package curl opts
meta={}; # object of arbitrary shape that is passed through
};
...
};
curlOpts=""; # global curl opts
derivation=null; # filled in, in step 2.
}
importLock custom fetcherAn importLock interface could look something like this:
{
packages= {
"<unique package or file name>" = {
hash="<hash>";
url="<url>";
derivation=null; # filled in, in step 2.
fetchurlArgs={}; # per package args sent to fetchurl
meta={}; # object of arbitrary shape that is passed through
};
...
};
fetchurlArgs={}; # global args sent to fetchurl
}
Probably there could be some useful helper functions for problems that commonly occur. What do you think about this? |
If we come up with a generic fetcher interface for npm and cdn/jsr dependencies there is still the gap of the custom files and metadata both package managers expect. Did you manage to figure out some way of doing that? Because from my understanding thats equally complex. I imagine deno could have a cli command to pre-populate its cache with offline packages, then run deno install or similar and it would produce its own structure as needed (In a non fod) as an idea. Since you seemed to be quite active with the deno maintainers could be worth asking if they offer something like that or endorse the idea. I imagine flatpak could need something similar |
I think you misunderstood me there. I proposed a generic interface for all custom fetchers in nixpkgs. But the only thing that is generalizable is the fetching itself. So there would be 3 steps to creating a custom fetcher:
So this is somewhat similar to writing 2 adapters. The point of this would be to encourage a common architecture across custom fetchers, which is endorsed by the nixpkgs maintainers. The format I have in mind has enough room to do things like meta data. I have iterated on it last week and its use will become apparent in the code and documentation, when I push it here this week.
I'm not sure I understand your idea. The problem we have to solve is that we have to do the fetching with our custom code and then we need an interface provided by deno where we can feed in our nix store paths with the fetched data and it will produce the directory structure it needs. Last week I finally managed to do exactly that for the jsr and http packages using deno's provided library. It took a while to find it and how to use it, though. |
8516e5b
to
983c0a6
Compare
Yes i assumed this.
Thats what i ment with the original comment. How do you think of solving this? My proposal was to try using deno itself, to avoid complexity, unsure if deno has some interfaces to support it. |
@@ -204,7 +204,7 @@ rustPlatform.buildRustPackage (finalAttrs: { | |||
|
|||
postInstall = '' | |||
# Remove non-essential binaries like denort and test_server | |||
find $out/bin/* -not -name "deno" -delete | |||
find $out/bin/* -not -name "deno" -not -name "denort" -delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deno already builds the denort binary but it's removed. The correct solution for us would probably have been to add an extra output & keep denort.
@06kellyjac is this what you meant in the previous PR?
As I said in my comment before:
This interface only works for https and jsr imports, though. Yesterday @rorosen helped me to write a small rust wrapper around that interface, which you can look at here: https://github.com/NixOS/nixpkgs/pull/419255/files#diff-657a96d5ebec4ff9ca7cf304c40ec07d4eb6f05b12b910577aa4719377f14e10 I had a wrapper written in deno before, but it proved difficult packaging that wrapper cleanly with nix, without this very PR (chicken and egg problem). Currently, I handle npm imports manually. Which means there is also only limited support. I outlined all the features that I am aware of, both missing and implemented, in the readme, I wrote for the fetcher |
Hi, I haven’t had time to read the whole thread (sorry), but just to belatedly reply to
We unfortunately can’t import locks from fetched sources because that would be IFD (import‐from‐derivation – a misleading term for “evaluation depending on the contents of derivation outputs”), and in general people aren’t too happy about having to vendor lock files in Nixpkgs to work around this (#327064). So I’m not sure what the current state of things is, but I would suggest prioritizing a hash‐based fetcher, as it’s likely what we’d want to use for most packages that have lock files checked in upstream. |
@emilazy thanks for checking in
Hm. I'm not sure I understand. What are
Does this not inevitably happen? I need to parse the lockfile at least for the list of packages. 'depending" on that I construct FODs. Why does it matter wether I use the hashes from the lockfile or not? |
Update on the progress: The build for the fresh-init-cli works now, which is certainly cool, since it is a large project with many dependencies. Currently binary builds for packages with imports from When building the fresh-init-cli, I noticed that the way this builder works currently is extremely slow, since it will create separate derivations per file of a jsr dependency. This means a lot of disk io for many imports of large jsr packages. |
Right. That’s not something we can do in Nixpkgs and not something the Rust or NPM fetchers do. They create a single FOD that does all the downloads based on the |
But there is the Can you explain to me why it's bad or point me to some resources? |
That functionality is primarily there for out‐of‐tree users. In Nixpkgs, it requires vendoring lock files which we prefer to do unless necessary. Outside of Nixpkgs, it’s convenient to use in a development environment for a package, because the lock file is right there, and also you can use it with external sources if you want since IFD is not forbidden by default outside of Nixpkgs. IFD is banned in Nixpkgs because the build blocks the entire single‐threaded evaluation, which as you can imagine would scale very poorly on the distributed Hydra cluster that handles six digits of packages. It also makes the derivation graph dynamic: you do not know what derivations a derivation depends on without a potentially unbounded amount of building and fetching. |
Thank you 🙏 Alright. So having the
Edit: We will see about that. |
Nvm. This https://nix.dev/manual/nix/2.30/language/import-from-derivation Made it clear. So the thing I can not do is, doing any readfile operation in nix on the result of a derivation, which especially means I can't read and analyse the lockfile in nix. Instead I have to pass the lockfile into the FOD, where a script analyses it. This is unfortunate. So the 2 builds, the single FOD build, and the "import from lock" build, are quite orthogonal to each other. One needs to do the whole analysis inside a derivation, the other one outside the derivation, in nix. I wish I understood this sooner 😄 |
I made this graph to convey the architecture for the fetcher I have in mind now. Does this make sense to you? Do you see any issues with it? |
Yes, sorry, it’s a bit arcane :) Basically, without IFD, everything is a strictly‐separated two‐stage process: evaluation happens and produces a graph of
Some thoughts:
I would personally suggest just trying to handle the Deno case for now. Changing the architecture of all fetchers is a really big task and sadly the architectures of various package managers are not as uniform as we might like, so we often need per‐package‐manager logic. More uniformity in this regard would be a good thing, but I worry that trying to tackle it would bog down the process here due to the need to get many ecosystems on board; I already regret that we had to revert the first time around here and think it would be good to be able to ship the results of your work as soon as we can. Keeping the design specific will make it much easier to get merged into Nixpkgs and common elements with other fetchers can be teased out later. |
I guess my graph is a little misleading there. The single FOD fetcher just produces a directory of all the fechted files, as well as the annotated json file. So the hash of the FOD will also be over the annotated lockfile, but should not be a problem from what I understand.
This is a very interesting point :D good lord this is so much more complex, than I ever hoped for. When you say
Does it mean this: Instead of making the directory structure transformer an extra derivation,
Oh no 😄 So what you're saying is that there also needs to be a lockfile transformer equivalent in nix code. With the architecture above, I wanted to reduce the duplication of the per language logic as much as possible. This duplication is 10 times worse for the deno fetcher, because there are so many edge cases and variations. So the way I see things now, I will only write the whole derivation build time pipeline. The nix eval time pipeline pretty much exists now, but there are some remaining issues, and the way I wrote it, it is not yet compatible with what is planned now. So I think I'm gonna cut it for now. Does the adapted the graph look better to you? |
This is a reopen of the reverted PR #407434
Discussion
We need to discuss specifics.
@emanueljg
@06kellyjac
@emilazy
@hsjobeki
Previous Problem
@emilazy brought up concerns about the previous PR.
Namely it's an issue that FODs are built using the Deno CLI as a dependency.
Since the Deno CLI considers the format of the dependency cache as an implementation detail, when we bump the Deno CLI version in nixpkgs, the Deno CLI could suddenly produce a FOD with a different hash. However nix caches FODs and until somebody would manually bump the output hash of the FOD, the cached version would be used, even if a rebuild would produce a different result. This can produce a mass breakage with a large delay, which can be very hard to debug.
The issue is fundamental to nix's FODs. The inputs (in this case the deno cli) are intentionally not checked when it comes to cache invalidation.
This is why build helpers in nixpkgs implement custom fetchers. This way nixpkgs has full control over when the FOD will change. If the expected dependency cache format from the Deno CLI would change, we would still produce the same FOD,
however the Deno CLI would tell us, that the format is invalid and throw an error. Then the custom fetcher will have to be adapted.
Dependency cache format viewed as an interface
I see 3 parts to this Interface
The lock file has a version, currently
version: 5
. So the lock file is "versioned".The format of how the directory tree for the dependencies looks like, that deno expects, is not stable or documented.
On top of that deno injects many stateful files into that directory tree, to cache work.
3.1. Some files can simply be deleted and will be regenerated by the deno cli.
3.2. Some files can't be deleted (
registry.json
andmeta.json
files), since Deno won't recognize the dependencies, if they are missing.Statement from Deno maintainers
I asked the maintainers in the Deno discord about the (in)stability of the interface.
Breaking changes survey
I surveyed previous versions of Deno and compared the interface pairwise:
Versions:
deno 1.46.3 ab7b6889ae9d484eed2876868209e33eb262511d 2024-09-26
deno 2.1.1 882842d2a908700540d206baa79efb922ac1c33d 2024-11-25
deno 2.2.4 c5dd43934613ae0f8ff37c59f61c507c2e8f980d 2025-03-25
deno 2.2.12 4684fd6b0c01e4b7d99027a34c93c2e09ecafee2 2025-05-24
deno 2.3.6 3078b9a 2025-06-23
Diffs:
The changes were all not monumental, so it could be feasible to keep a custom fetcher up to date.
Before v2.0.0, the deno cli worked quite differently. For example,
deno install
would not generate a lock file. This highlights an aspect that I didn't think about before. When using the Deno CLI in the FOD, the CLI itself is also something that could experience breaking changes.Building a custom fetcher
It would be possible to circumvent building the custom fetcher, if we make the deno version part of the
denoDeps
build'sname
. That way, when the deno cli version changes, the FOD has to be rebuilt and we don't run into the cache issue. However we will have to do more rebuilds of thedenoDeps
FOD's than necessary, since it appears that not every version bump does break the interface.Implementing the fetcher could prove difficult due to the complexity of Deno's package system. Deno supports multiple package sources.
jsr:
from the jsr registrynpm:
from the npm registryEach of these sources has their own distinct format.
NPM has a special role here, since NPM package generally can execute scripts after being installed to do some post-install work, like downloading some big asset. This also has to happen inside the FOD.
Unfortunately, we can't just reuse the
buildNpmPackage
build helper for that part without extra effort, since it expects apackage-lock.json
file, and we just have thedeno.lock
file, and conversion with transitive dependencies could prove difficult from what I have glimpsed so far.What do you think about this?
Should I implement the custom fetcher or is it not worth it?
The way I see it, the maintenance burden is probably gonna be larger with the custom fetcher, since it will rely on more implementation details from Deno. And the upfront cost to implement the custom fetcher could be large.