Skip to content

Feat/build deno package second try #419255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

aMOPel
Copy link
Contributor

@aMOPel aMOPel commented Jun 23, 2025

This is a reopen of the reverted PR #407434

Discussion

We need to discuss specifics.

@emanueljg
@06kellyjac
@emilazy
@hsjobeki

Previous Problem

@emilazy brought up concerns about the previous PR.

Namely it's an issue that FODs are built using the Deno CLI as a dependency.
Since the Deno CLI considers the format of the dependency cache as an implementation detail, when we bump the Deno CLI version in nixpkgs, the Deno CLI could suddenly produce a FOD with a different hash. However nix caches FODs and until somebody would manually bump the output hash of the FOD, the cached version would be used, even if a rebuild would produce a different result. This can produce a mass breakage with a large delay, which can be very hard to debug.

The issue is fundamental to nix's FODs. The inputs (in this case the deno cli) are intentionally not checked when it comes to cache invalidation.

This is why build helpers in nixpkgs implement custom fetchers. This way nixpkgs has full control over when the FOD will change. If the expected dependency cache format from the Deno CLI would change, we would still produce the same FOD,
however the Deno CLI would tell us, that the format is invalid and throw an error. Then the custom fetcher will have to be adapted.

Dependency cache format viewed as an interface

I see 3 parts to this Interface

  1. The lock file has a version, currently version: 5. So the lock file is "versioned".

  2. The format of how the directory tree for the dependencies looks like, that deno expects, is not stable or documented.

  3. On top of that deno injects many stateful files into that directory tree, to cache work.

    3.1. Some files can simply be deleted and will be regenerated by the deno cli.
    3.2. Some files can't be deleted (registry.json and meta.json files), since Deno won't recognize the dependencies, if they are missing.

Statement from Deno maintainers

I asked the maintainers in the Deno discord about the (in)stability of the interface.

That's correct, it's a moving target and an implementation detail, we don't give any stability guarantees around it, because we need to be able to change that installation structure if needed.
--bartlomieju

there's no documentation. It's somewhat stable but kind of unstable. I'd recommend looking into using the Rust crates like https://github.com/denoland/deno_cache_dir and https://crates.io/crates/deno_npm_cache
--dsherret

Breaking changes survey

I surveyed previous versions of Deno and compared the interface pairwise:

Versions:

Diffs:

  • v1.46.3_v2.1.1: point 1. changed, point 2. changed
  • v2.1.1_v2.2.4: no changes
  • v2.2.4_v2.2.12: no changes
  • v2.2.12_v2.3.6: point 1. changed, point 3.2 changed

The changes were all not monumental, so it could be feasible to keep a custom fetcher up to date.

Before v2.0.0, the deno cli worked quite differently. For example, deno install would not generate a lock file. This highlights an aspect that I didn't think about before. When using the Deno CLI in the FOD, the CLI itself is also something that could experience breaking changes.

Building a custom fetcher

It would be possible to circumvent building the custom fetcher, if we make the deno version part of the denoDeps build's name. That way, when the deno cli version changes, the FOD has to be rebuilt and we don't run into the cache issue. However we will have to do more rebuilds of the denoDeps FOD's than necessary, since it appears that not every version bump does break the interface.

Implementing the fetcher could prove difficult due to the complexity of Deno's package system. Deno supports multiple package sources.

  1. jsr: from the jsr registry
  2. npm: from the npm registry
  3. A URL, directly fetching from a url, e.g. github or a cdn.

Each of these sources has their own distinct format.
NPM has a special role here, since NPM package generally can execute scripts after being installed to do some post-install work, like downloading some big asset. This also has to happen inside the FOD.
Unfortunately, we can't just reuse the buildNpmPackage build helper for that part without extra effort, since it expects a package-lock.json file, and we just have the deno.lock file, and conversion with transitive dependencies could prove difficult from what I have glimpsed so far.

What do you think about this?

Should I implement the custom fetcher or is it not worth it?

The way I see it, the maintenance burden is probably gonna be larger with the custom fetcher, since it will rely on more implementation details from Deno. And the upfront cost to implement the custom fetcher could be large.

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Jun 23, 2025
@emilazy
Copy link
Member

emilazy commented Jun 23, 2025

In general we want to do as little processing as possible in FOD derivations, so that we have little reason to change them. The fundamental thing only a FOD can do is download resources from the internet. It’s possible to adapt things to whatever layout a new version of Deno is expecting in the non‐FOD build, but it’s not possible to change existing FODs. It’s of course acceptable to change the format for new lock file versions, since there’s never any existing FODs using those.

So, for instance, fetchCargoVendor just downloads tarballs straight from crates.io without unpacking, and uses nix-prefetch-git for Git dependencies. Unpacking things and assembling the actual Cargo vendor directory is postponed to a later stage which isn’t a FOD and can be adapted as needed.

My suggestion would be to do the bare minimum of computing the downloads that need to be done and placing them in the output directory without any further processing unless absolutely necessary. I’m not sure exactly what the format looks like for JSR packages, but for NPM maybe we could adapt the existing fetcher codebase so that there’s a lower layer that can take the relevant information for a single package directly rather than requiring it to be in package-lock.json format?

As long as the downloads are unprocessed and placed in a naming scheme that we don’t expect to need to change (e.g. one consideration that has been required with Cargo is ensuring that we can support multiple versions of the same package name in a dependency tree), it should be pretty safe.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 24, 2025

Deno maintainer dsherret pointed me to some rust crates, which the deno cli uses, like https://github.com/denoland/deno_cache_dir and https://crates.io/crates/deno_npm_cache
So I thought, it seems very reasonable to just write the fetcher in rust and use those crates as to get the logic straight from the source. Then I thought, since the deno cli is using those crates, I might as well look into the codebase of deno and copy the relevant code for the fetcher. At that point I am just pointlessly duplicating the deno cli, though.

Here comes the idea:
What if our custom fetcher is the deno cli, BUT not the version pinned in nixpkgs, but a different (older) one, just used for the fetcher.
This way we have decoupled the deno versions. And the problem you were worried about, @emilazy, could not occur anymore.
When pkgs.deno breaks compatibility with the current FODs dependency cache format, then we have to update the hashes.
We could enforce having to update the hashes, by making the deno fetcher version part of the FOD's name. Either way, there will be an error, since the old incompatible FODs were created with the old incompatible deno cli.

What do you think about that? Am I missing something?

@emilazy
Copy link
Member

emilazy commented Jun 24, 2025

If we don’t want to break FODs, we would have to keep the fixed Deno version and all its dependencies building forever, likely multiple versions as we add more to support newer lock file formats, and run into the fact that the versions will inevitably become EOL and develop security vulnerabilities (that may even be relevant to FOD fetching – e.g. TLS issues). Breaking FODs loudly is an option, but a pretty frustrating one for downstream users depending on how often it happens.

I am not sure that deno_cache_dir would be necessary for a custom fetcher, though? As I said, we don’t have to match Deno’s format in the FOD output. We only need to download things. Processing it into the format expected by Deno can happen in a non‐FOD derivation without the same compatibility constraints.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 24, 2025

we would have to keep the fixed Deno version and all its dependencies building forever, likely multiple versions as we add more to support newer lock file formats, and run into the fact that the versions will inevitably become EOL and develop security vulnerabilities (that may even be relevant to FOD fetching – e.g. TLS issues)

And that is not the case for a custom fetcher, because we can bump the dependencies of the custom fetcher (like the compiler) without changing the hashes of the FODs it produces?


Anyway, the perk of breaking hashes way less often, by decoupling the fetcher and transformer, convinced me of the solution.

Thanks you for your patience, I needed a little bit of time to fully understand the implications.

I'm gonna write the custom fetcher with go, since I'm not really acquainted with rust yet.


I also looked into a way to have an equivalent to importNpmLock, but so far it looks like it's not possible, due to how jsr and deno work together. Will keep you updated.

Edit: should be possible

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 25, 2025

@emilazy

Is there a good reason to have the option to build the dependencies with a hash, or would it be sufficient to just have a fetcher that imports all hashes from the lockfile?

And another question, in the case of deno and jsr, it seems that individual files are downloaded and hashed, instead of whole tar balls. So if I implement an importLock functionality, it would create a lot of derivations, for each file of a dependency individually. Is that a problem?

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 25, 2025

So my plan currently is to just build an importLock functionality, no custom fetcher using a hash.

And it works like this:

NOTE: since there are 3 kinds of dependencies with deno (npm, jsr and direct url), there are usually 3 cases to consider

  1. parse the lockfile into a common format (preprocessing step)
    1. get a deno.lock file as an argument.
    2. there is a parser (written in nix) for the lockfile that
      a. reads the version of the lockfile
      b. chooses the appropriate converter for the version
      c. converts the packages information from the lockfile into our custom common lockfile format
    3. there is a npm lockfile converter that converts the npm section of the common lockfile format into a format that npm accepts
  2. fetch deps (FOD step)
    1. there is a custom fetcher for jsr dependencies. it receives the common lockfile format as an argument and
      a. fetches the files (using nixpkgs fetchurl), where each file needs its own derivation, since that is what we have the hashes for
      b. the derivations are added to the lockfile attrset
    2. there is a custom fetcher for url dependencies. it receives the common lockfile format as an argument and
      a. fetches the files (using nixpkgs fetchurl), where each file needs its own derivation, since that is what we have the hashes for
      b. the derivations are added to the lockfile attrset
    3. the generated npm lock file is fed to the fetchNpmDeps fetcher and the resulting node_modules folder is our common npm deps folder structure
  3. convert folder structure and metadata (postprocessing step)
    1. there are 3 converters, they run inside a mkDerication buildPhase
      a. there is a common jsr deps folder structure converter
      b. there is a common url deps folder structure converter
      c. there is a common npm deps folder structure converter
    2. they convert our common formats into the formats that the current deno version expects, they also creates any necessary meta data files etc.
      this could be more involved logic, using information from the lockfile, which might be difficult to do well in just shell script, so this should probably be done in some stronger language.

With preprocessing and postprocessing, the FOD's won't change often, if ever, since they are decoupled from any breaking changes that deno could introduce.
If the lockfile changes, then we write a new preprocessing converter.
If the folder strucuture changes, then we write a new postprocessing converter.

Still it's possible that something in step 2 will have to be changed, so
the version of that fetcher should be part of the name of the FODs.


I am unsure about wether 2.3. is a good idea, since it introduces a dependency on the fetchNpmDeps helper.

You said

but for NPM maybe we could adapt the existing fetcher codebase so that there’s a lower layer that can take the relevant information for a single package directly rather than requiring it to be in package-lock.json format

But I'm not sold on that yet either.


There are multiple individual rust crates that contain the logic for the things we need to do

For deno.lock: https://crates.io/crates/deno_lockfile
For npm cache: https://crates.io/crates/deno_npm_installer https://crates.io/crates/deno_npm_cache

and some more.

From what I can tell they are not really doing what we need.

On paper it seems like a good idea to use them, but somehow they don't really fit into the architecture I lined out now.


Ideas? Objections?

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 26, 2025

Update:

implementing the jsr side of things like this went fine.

the url dependencies proved to be more convoluted than expected, since they require custom transformation logic per domain and I haven't fully understood the details yet.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 30, 2025

Update:

I've hit a snare with the importLock implementation.

Deno requires to know about some response headers of fetched files.

While you can use the curl flag -D to get the headers, to access the headers, they need to be added to $out, at which point the hash of the FOD changes and the hash of the lockfile can't be used anymore. This defeats the whole point of the importLock implementation. After all it's supposed to build everything just using the hashes in the lockfile, no need to specify a hash in the nix derivation.

This seems to be inevitable. It's imaginable to do this with a hack, by hard coding those headers in certain situations, since they are only required in rare cases, but there will probably be problems with that.

So I think it's not possible to do the importLock cleanly, which means I'm gonna stick to the approach which requires an explicit hash in the nix derivation.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jun 30, 2025

@emilazy while talking about internals with deno maintainers on their discord, a flatpak contributor told me, they also did a deno build helper for flatpak recently. I looked at the code and it got me thinking.

At flatpak they have a standardized interface for the whole custom fetcher problem.

I thought it would be cool, if nixpkgs could also have a standardized interface. Surely it won't work in every case, but unifying most of the language build helpers could improve maintainability and at the same time encourage to do it the right way, the way you told me to do it now, to decouple the FODs from any upstream changes.

Single FOD custom fetcher

A non-importLock custom fetcher could look something like this:

  1. The custom fetcher needs to parse the lock and transform it into our standard interface:
{
  hash="";
  packages= {
    "<unique package or file name>" = {
        url="<url>";
        path=null; # in step 2., the url is transformed into a unique path within the derivation, maybe by creating a hash of the url
        curlOpts=""; # per package curl opts
        meta={}; # object of arbitrary shape that is passed through
      };
      ...
   };
   curlOpts=""; # global curl opts
   derivation=null; # filled in, in step 2.
}
  1. That data is given to a nixpkgs lib function which creates a FOD using curl to download the packages from the urls and put them where the paths point. All those paths are within $out.
  2. The custom fetcher then restructures the folder structure as necessary in a new derivation using the FOD from step 2. and the paths and the meta data.

importLock custom fetcher

An importLock interface could look something like this:

  1. The build helper needs to parse the lock and transform it into our standard interface.
{
  packages= {
    "<unique package or file name>" = {
        hash="<hash>";
        url="<url>";
        derivation=null; # filled in, in step 2.
        fetchurlArgs={}; # per package args sent to fetchurl
        meta={}; # object of arbitrary shape that is passed through
      };
    ...
  };
  fetchurlArgs={}; # global args sent to fetchurl
}
  1. That data is given to a nixpkgs lib function which realizes all those FODs individually using fetchurl and fills in the derivations.
  2. The custom fetcher then assembles the desired folder structure in a new derivation using all those derivations and the meta data.

Probably there could be some useful helper functions for problems that commonly occur.

What do you think about this?

@hsjobeki
Copy link
Contributor

hsjobeki commented Jul 4, 2025

If we come up with a generic fetcher interface for npm and cdn/jsr dependencies there is still the gap of the custom files and metadata both package managers expect. Did you manage to figure out some way of doing that? Because from my understanding thats equally complex.

I imagine deno could have a cli command to pre-populate its cache with offline packages, then run deno install or similar and it would produce its own structure as needed (In a non fod) as an idea.

Since you seemed to be quite active with the deno maintainers could be worth asking if they offer something like that or endorse the idea. I imagine flatpak could need something similar

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 7, 2025

If we come up with a generic fetcher interface for npm and cdn/jsr dependencies there is still the gap of the custom files and metadata both package managers expect. Did you manage to figure out some way of doing that? Because from my understanding thats equally complex.

I think you misunderstood me there.

I proposed a generic interface for all custom fetchers in nixpkgs. But the only thing that is generalizable is the fetching itself.

So there would be 3 steps to creating a custom fetcher:

  1. write custom logic to transform a package lock file into the generalized format
  2. feed the data in the generalized format into the fetcher, which produces a generalized output format
  3. write custom logic to transform the file paths in the generalized output format into the file structure required by the language's package manager.

So this is somewhat similar to writing 2 adapters. The point of this would be to encourage a common architecture across custom fetchers, which is endorsed by the nixpkgs maintainers.

The format I have in mind has enough room to do things like meta data. I have iterated on it last week and its use will become apparent in the code and documentation, when I push it here this week.

I imagine deno could have a cli command to pre-populate its cache with offline packages, then run deno install or similar and it would produce its own structure as needed (In a non fod) as an idea.

I'm not sure I understand your idea. The problem we have to solve is that we have to do the fetching with our custom code and then we need an interface provided by deno where we can feed in our nix store paths with the fetched data and it will produce the directory structure it needs.

Last week I finally managed to do exactly that for the jsr and http packages using deno's provided library. It took a while to find it and how to use it, though.

@aMOPel aMOPel force-pushed the feat/buildDenoPackage-second branch from 8516e5b to 983c0a6 Compare July 10, 2025 10:59
@nixpkgs-ci nixpkgs-ci bot added 8.has: changelog This PR adds or changes release notes 6.topic: nodejs Node.js is a free, open-source, cross-platform JavaScript runtime environment 8.has: documentation This PR adds or changes documentation 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Jul 10, 2025
@hsjobeki
Copy link
Contributor

  • write custom logic to transform a package lock file into the generalized format

  • feed the data in the generalized format into the fetcher, which produces a generalized output format

  • write custom logic to transform the file paths in the generalized output format into the file structure required by the language's package manager.

Yes i assumed this.

need an interface provided by deno where we can feed in our nix store paths with the fetched data and it will produce the directory structure it needs.

Thats what i ment with the original comment. How do you think of solving this? My proposal was to try using deno itself, to avoid complexity, unsure if deno has some interfaces to support it.

@@ -204,7 +204,7 @@ rustPlatform.buildRustPackage (finalAttrs: {

postInstall = ''
# Remove non-essential binaries like denort and test_server
find $out/bin/* -not -name "deno" -delete
find $out/bin/* -not -name "deno" -not -name "denort" -delete
Copy link
Contributor Author

@aMOPel aMOPel Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deno already builds the denort binary but it's removed. The correct solution for us would probably have been to add an extra output & keep denort.

@06kellyjac is this what you meant in the previous PR?

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. and removed 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Jul 10, 2025
@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 11, 2025

@hsjobeki

Thats what i ment with the original comment. How do you think of solving this? My proposal was to try using deno itself, to avoid complexity, unsure if deno has some interfaces to support it.

As I said in my comment before:

Last week I finally managed to do exactly that for the jsr and http packages using deno's provided library. It took a while to find it and how to use it, though.

This interface only works for https and jsr imports, though. Yesterday @rorosen helped me to write a small rust wrapper around that interface, which you can look at here: https://github.com/NixOS/nixpkgs/pull/419255/files#diff-657a96d5ebec4ff9ca7cf304c40ec07d4eb6f05b12b910577aa4719377f14e10

I had a wrapper written in deno before, but it proved difficult packaging that wrapper cleanly with nix, without this very PR (chicken and egg problem).

Currently, I handle npm imports manually. Which means there is also only limited support.

I outlined all the features that I am aware of, both missing and implemented, in the readme, I wrote for the fetcher

https://github.com/aMOPel/nixpkgs/blob/feat/buildDenoPackage-second/pkgs/build-support/deno/fetch-deno-deps/readme.md

@emilazy
Copy link
Member

emilazy commented Jul 11, 2025

Hi, I haven’t had time to read the whole thread (sorry), but just to belatedly reply to

So my plan currently is to just build an importLock functionality, no custom fetcher using a hash.

We unfortunately can’t import locks from fetched sources because that would be IFD (import‐from‐derivation – a misleading term for “evaluation depending on the contents of derivation outputs”), and in general people aren’t too happy about having to vendor lock files in Nixpkgs to work around this (#327064). So I’m not sure what the current state of things is, but I would suggest prioritizing a hash‐based fetcher, as it’s likely what we’d want to use for most packages that have lock files checked in upstream.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 11, 2025

@emilazy thanks for checking in

We unfortunately can’t import locks from fetched sources because that would be IFD (import‐from‐derivation – a misleading term for “evaluation depending on the contents of derivation outputs”), and in general people aren’t too happy about having to vendor lock files in Nixpkgs to work around this (#327064). So I’m not sure what the current state of things is, but I would suggest prioritizing a hash‐based fetcher, as it’s likely what we’d want to use for most packages that have lock files checked in upstream.

Hm. I'm not sure I understand. What are buildRustPackage and buildNpmPackage doing then, if not import locks from fetched sources?

“evaluation depending on the contents of derivation outputs”

Does this not inevitably happen? I need to parse the lockfile at least for the list of packages. 'depending" on that I construct FODs. Why does it matter wether I use the hashes from the lockfile or not?

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 11, 2025

Update on the progress:

The build for the fresh-init-cli works now, which is certainly cool, since it is a large project with many dependencies.

Currently binary builds for packages with imports from esm.sh don't work. I'm planning to fix that.

When building the fresh-init-cli, I noticed that the way this builder works currently is extremely slow, since it will create separate derivations per file of a jsr dependency. This means a lot of disk io for many imports of large jsr packages.
I chose this method until now, since it allows completely inferring the hashes from the deno.lock file for jsr packages. Also it makes the caching extremely granular, which results in less refetching. But the price might be too steep. Also the way emilazy makes it sound, importing the hashes from the lockfile, does not seem to be a viable solution anyways.

@emilazy
Copy link
Member

emilazy commented Jul 11, 2025

Does this not inevitably happen? I need to parse the lockfile at least for the list of packages. 'depending" on that I construct FODs. Why does it matter wether I use the hashes from the lockfile or not?

Right. That’s not something we can do in Nixpkgs and not something the Rust or NPM fetchers do. They create a single FOD that does all the downloads based on the Cargo.lock file – that means that the dependency on the lock file is only at build time, not eval time, which avoids IFD. The hash then covers that single FOD.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 11, 2025

But there is the import from lock functionality for both, where one doesn't need to specify any hash. It does exist. But you are saying it is bad practice?

Can you explain to me why it's bad or point me to some resources?

@emilazy
Copy link
Member

emilazy commented Jul 11, 2025

That functionality is primarily there for out‐of‐tree users. In Nixpkgs, it requires vendoring lock files which we prefer to do unless necessary. Outside of Nixpkgs, it’s convenient to use in a development environment for a package, because the lock file is right there, and also you can use it with external sources if you want since IFD is not forbidden by default outside of Nixpkgs.

IFD is banned in Nixpkgs because the build blocks the entire single‐threaded evaluation, which as you can imagine would scale very poorly on the distributed Hydra cluster that handles six digits of packages. It also makes the derivation graph dynamic: you do not know what derivations a derivation depends on without a potentially unbounded amount of building and fetching.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 12, 2025

Thank you 🙏

Alright. So having the import from lockfile functionality is for Devs who want to use this build helper for their projects. The build with one hash is for nixpkgs.

Then I'm gonna provide both.

Edit: We will see about that.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 12, 2025

@emilazy

For further clarification:

I can analyse the lockfile in nix and change the properties of the derivation, just as long as it stays a single derivation (in this case a FOD).

The problem arises when the number of derivations is dynamic depending on some nix evaluation.

Nvm. This

https://nix.dev/manual/nix/2.30/language/import-from-derivation

Made it clear.

So the thing I can not do is, doing any readfile operation in nix on the result of a derivation, which especially means I can't read and analyse the lockfile in nix. Instead I have to pass the lockfile into the FOD, where a script analyses it.

This is unfortunate. So the 2 builds, the single FOD build, and the "import from lock" build, are quite orthogonal to each other. One needs to do the whole analysis inside a derivation, the other one outside the derivation, in nix.

I wish I understood this sooner 😄

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 13, 2025
@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 14, 2025

fetcher-architecture drawio

@emilazy

I made this graph to convey the architecture for the fetcher I have in mind now. Does this make sense to you? Do you see any issues with it?

@emilazy
Copy link
Member

emilazy commented Jul 14, 2025

I wish I understood this sooner 😄

Yes, sorry, it’s a bit arcane :) Basically, without IFD, everything is a strictly‐separated two‐stage process: evaluation happens and produces a graph of .drvs, and then those are built to produce outputs. Since we use neither built‐in fetchers nor IFD in Nixpkgs, that means that nothing about the .drvs can depend on anything outside of Nixpkgs, and that therefore we can’t use any non‐vendored lock files. You can use the allow-import-from-derivation Nix setting to ensure your design works with it.

I made this graph to convey the architecture for the fetcher I have in mind now. Does this make sense to you? Do you see any issues with it?

Some thoughts:

  1. If I understand your graph right, I don’t think you can have a derivation that produces one FOD output and one non‐FOD output. So annotating the JSON would have to be a separate derivation.

  2. There may be Deno packages that have no lock file checked in upstream. Ideally we convince upstreams to fix this, but sometimes they’re unwilling. We would still like to be able to package these in Nixpkgs. Usually, what we do there is accept generating and vendoring a lock file in Nixpkgs for that case and do importCargoLock or equivalent. I think that your current design would not permit this, because the conversion derivation in the middle would be IFD.

  3. We probably don’t want the cache conversion to be a separate derivation from the actual build of the package, because it will bloat cache.nixos.org. The transitive closure of everything always gets pushed out to the cache, so in general a split of “FOD that analyses the lock file, does all the fetching, and processes as little as possible” + “language‐specific builder that consumes the FOD output, converts it as appropriate, and does the build” wastes the minimum amount of duplicated space. IIRC with Rust we have another derivation in the middle that may not be ideal in terms of cache usage, I forget whether @TomaSajt regrets this or not :)

I would personally suggest just trying to handle the Deno case for now. Changing the architecture of all fetchers is a really big task and sadly the architectures of various package managers are not as uniform as we might like, so we often need per‐package‐manager logic. More uniformity in this regard would be a good thing, but I worry that trying to tackle it would bog down the process here due to the need to get many ecosystems on board; I already regret that we had to revert the first time around here and think it would be good to be able to ship the results of your work as soon as we can. Keeping the design specific will make it much easier to get merged into Nixpkgs and common elements with other fetchers can be teased out later.

@aMOPel
Copy link
Contributor Author

aMOPel commented Jul 14, 2025

If I understand your graph right, I don’t think you can have a derivation that produces one FOD output and one non‐FOD output. So annotating the JSON would have to be a separate derivation.

I guess my graph is a little misleading there. The single FOD fetcher just produces a directory of all the fechted files, as well as the annotated json file. So the hash of the FOD will also be over the annotated lockfile, but should not be a problem from what I understand.

We probably don’t want the cache conversion to be a separate derivation from the actual build of the package, because it will bloat cache.nixos.org

This is a very interesting point :D good lord this is so much more complex, than I ever hoped for.

When you say

“FOD that analyses the lock file, does all the fetching, and processes as little as possible” + “language‐specific builder that consumes the FOD output, converts it as appropriate, and does the build”

Does it mean this:

Instead of making the directory structure transformer an extra derivation,
it is just a build step when building the package?
The graph above is just about the fetcher. The result is fed into the package build.
So I can also just feed the results of either of the two fetcher variants into the package build itself, and have the directory structure transformer be a script, which is executed somewhen before the buildPhase of the package build.

There may be Deno packages that have no lock file checked in upstream. Ideally we convince upstreams to fix this, but sometimes they’re unwilling. We would still like to be able to package these in Nixpkgs. Usually, what we do there is accept generating and vendoring a lock file in Nixpkgs for that case and do importCargoLock or equivalent. I think that your current design would not permit this, because the conversion derivation in the middle would be IFD.

Oh no 😄 So what you're saying is that there also needs to be a lockfile transformer equivalent in nix code.

With the architecture above, I wanted to reduce the duplication of the per language logic as much as possible.
After all, the single FOD fetcher and the "import from lock" fetcher both have basically the same logic, just in different languages and with slightly different outputs.

This duplication is 10 times worse for the deno fetcher, because there are so many edge cases and variations.
Writing and maintaining that code twice does not sound like fun.

So the way I see things now, I will only write the whole derivation build time pipeline. The nix eval time pipeline pretty much exists now, but there are some remaining issues, and the way I wrote it, it is not yet compatible with what is planned now. So I think I'm gonna cut it for now.

Does the adapted the graph look better to you?

builder-architecture drawio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: nodejs Node.js is a free, open-source, cross-platform JavaScript runtime environment 8.has: changelog This PR adds or changes release notes 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants