Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Entropic #64

Open
andrew opened this issue Jun 4, 2019 · 2 comments
Open

Entropic #64

andrew opened this issue Jun 4, 2019 · 2 comments

Comments

@andrew
Copy link
Collaborator

andrew commented Jun 4, 2019

Only just announced this weekend as jsconf, Entropic is a new package manager registry and client intended to be a federated replacement for npm.

Video: https://www.youtube.com/watch?v=MO8hZlgK5zc
slides: https://speakerdeck.com/ceejbot/the-economics-of-package-management

Very exciting, I can see a world in which each large company publishes their modules to their own entropic instance and take on the cost of hosting it, [email protected]/redux for example.

Along with the GitHub Package Registry we're going to see people start to depend on more than just one registry to resolve their dependency trees which brings a number of new challenges to many of the communities that have traditionally relied on a single source of pacakges.

There's little documentation on how the syncing between instances will work yet so I'll keep my eye open for that because, as I outlined in #52, as the number of registries required to be available increases, so does the likelihood of availability issues (downtime, deleted packages etc), so having each instance keep a copy of all the dependencies it needs will be important.

Some initial observations, although please bare in mind that the project is only a month old and may change considerably over time.

Federation

Entropic package names include the domain name of the registry and a namespace from that instance as well: [email protected]/ds

Currently if a delcared package doesn't confirm to that format then it falls back to proxying packages from npmjs.org, which are then saved locally within that instance under a legacy namespace: [email protected]/user-home

n.b. namespaces and package names are not case insensitive at the moment, could cause some typosquatting problems on popular, open instances

Usernames are currently based on GitHub oauth but team names that share the same namespace can be anything, so namespaces may be similar across instances but can't be guaranteed to be globally unique, which is why the domain name is required.

There's currently no sync between instances, that's pushed to the client at the moment.

It doesn't look like an Entropic instance will attempt to eagerly cache dependencies of packages, instead lazily caching packages as requested, that is to say, the server does not attempt to do any dependency resolution, that's left entirely to the client.

All packages published to Entropic are public, access control needs to be done in front of the instance, i.e. put something in front of it to make it private (firewall, nginx etc)

Content addressable storage

Internally both the registry and the client use content addressing for storing version and file data, although the content hashes aren't exposed to the user, once lockfile support is added to the client I'd expect to see hashes in there.

Object storage uses hashes (sha512 by default) for individual files: https://github.com/entropic-dev/entropic/blob/master/registry/lib/object-storage.js

When a package is proxied from npmjs.org, the tarball is unwrapped and each individual file written into the object store for each version, there doesn't appear to be any compression happening when storing contents.

n.b. possibly vulnerable to zip bomb attacks?

Multihashes aren't used but the hashing algorithm is stored, hashes are passed to the client following the Subresource Integrity format: sha512-nDoyy...ekwZ3lc4g==

The JSON API has some similarities with IPLD, for example listing available packages on an instances:

{
  "objects": [
    {
       "name":"[email protected]/user-home",
       "yanked":false,
       "created":"2019-05-21T05:21:42.958Z",
       "modified":"2019-05-21T05:21:43.145Z",
       "require_tfa":false,
       "versions":{
          "1.0.0":"sha512-zbIl1aIaw2+6yUXcly4iEPt97bGhD9YVQQf0ZNee7/+pLSOleoeJIiplhPE+6uyS3jcxN476AIf2JZCllJx6XQ==",
          "1.1.0":"sha512-czg4esUBkpWKacQcJPFap3OUpw9wPbVjk9XhrF5kzX7l1wwKHH0NA+nQQSeAblr0AZGYgsQkypuPuyfZ2fXbxg==",
          "1.1.1":"sha512-WgvO0R0R60jA5T2Nm8xSj+bFdvthiTssED0wHOBpvmkz3pKdL75hilOgyhxbkZBShQ/JDKfLNt834E6n2OMn6w==",
          "2.0.0":"sha512-CQylFuv8JL9uQw3WIQq0TPmqxBENU2cvKNqZgnaxI4h7X5sJWerVLJgWtYThEYZmgkvvXQEUf8W/WNDQCwIh4Q=="
       },
       "tags":{
          "latest":"2.0.0"
       }
    },  
    ...
  ]
}

Where each version number points to a hash, which is then looked up from the object API:

https://registry.entropic.dev/v1/objects/object/sha512/zbIl1aIaw2...lJx6XQ==

The hash for each version is itself JSON, with a list of file paths and the hashes of their contents, plus various groups of dependencies:

{
   "files":{
      "./package/cli.js":"sha512-1vFyxvWjQO6bMKMv9vDIWYIp+84mM8AfaUcDwLLW/qHl3iq570fyq6b4ZDudQz5gwAW34ADG31mOgTIvEWYKZQ==",
      "./package/index.js":"sha512-aI5vpJpjK546ZqSUALtPhgF2jGH1Xtjfy3rorUFai4K5LDPWKQM/yN7KETtEAtrtXDXDwyiX/jPNA6LWFDeRIQ==",
      "./package/readme.md":"sha512-66276uUrV7mcMvJcEI4dQHFLP+/DqbyfUai32IMTYYPU1uIqcEr9j7jCE4z/BAbwHEcOj/nhAqbHW+xDA6r91Q==",
      "./package/package.json":"sha512-RCHhLML+GWS9AfhkrZVnbpNclD+jo8lvalLkLnUAg8/Rz/PsUxThES2P+xPEZF+89770k92CxQrpcY5ysuLq0w=="
   },
   "dependencies":{

   },
   "devDependencies":{
      "ava":"0.0.3"
   },
   "peerDependencies":{

   },
   "optionalDependencies":{

   },
   "bundledDependencies":{

   },
   "signatures":{

   }
}

n.b there's fields in the database for storing signatures but no support for them in the client yet

Client

The client ds introduces new manifest formats and doesn't support npm's package.json or package-lock.json, instead going for Toml files called Package.toml and Package.lock (although the later isn't implemented yet), which ends up looking a lot of like Cargo.toml from Rust.

Package.toml example:

name = "[email protected]/ds"
version = "0.0.0-beta"

[dependencies]
"[email protected]/ipfs-npm-republish" = "^1.0.8"

Can this work with IPFS?

There's actually already an issue open to discuss, although the performance problems highlighted in @achingbrain's patch to pacote don't look great.

Talking of pacote, it's used by entropic internally for proxying data and packages from npmjs.org, so if that PR ever got merged, entropic would have some IPFS support but not in a meaningful way.

It looks like there's plans to extract the object store so different adapters could be used, s3 for example, but that will likely be a straight key-value store. I believe the only way to use IPFS as a key value store is with an MFS directory, but then the client would need to be kept up to date on the root of that MFS directory which would be changing very rapidly (every time a new file is written)

n.b. the docs page for MFS is really bad: https://docs.ipfs.io/guides/concepts/mfs/

To really take advantage of IPFS, instead of storing straight SHA256 hashes, using multihashes in IPFS to hash and store the content would enable:

  • different instances to sync content between each other seemlessly
  • enable client's to pull content from any instances that have it, as well as other clients connected to the DHT

The source of each version hash would still originate from the instances postgres database, requesting metadata for a package by name and host would still be done via http but the result would include IPFS CIDs instead of just the SHA256 of the metadata for each version:

{
  "objects": [
    {
       "name":"[email protected]/user-home",
       "yanked":false,
       "created":"2019-05-21T05:21:42.958Z",
       "modified":"2019-05-21T05:21:43.145Z",
       "require_tfa":false,
       "versions":{
          "1.0.0":"/ipfs/Qmc8YRiLFWpyHdxHCqSFDsvBneeCbvRq4oCoPPgKccjwKP",
          "1.1.0":"/ipfs/Qmc83fVpubVNKjPvqR1uVUz7Y7cVzefkoagSBcCyRGFVog",
          "1.1.1":"/ipfs/QmPFoDTh7WXm7G2auv4hS2CMHEcUp95qLVzrU2eAvgyXpR",
          "2.0.0":"/ipfs/QmX9WpUQ4oA6VkLn2nmLnhcNcd5XzCc5Scn3Wbz4HgqTta"
       },
       "tags":{
          "latest":"2.0.0"
       }
    },  
    ...
  ]
}

n.b. this stage requires the client to be able to reach the registry, i.e. online and avaiable over http, unless it has a cache of the response from a previously successful request

The client can then fetch the CID for each version from the IPFS swarm of registries and clients, which would have a list of files, dependencies and signatures (looks a lot like IPLD):

{
   "files":{
      "./package/cli.js":"/ipfs/Qmc8YRiLFWpyHdxHCqSFDsvBneeCbvRq4oCoPPgKccjwKP",
      "./package/index.js":"/ipfs/Qmc8YRiLFWpyHdxHCqSFDsvBneeCbvRq4oCoPPgKccjwKP",
      "./package/readme.md":"/ipfs/QmSk3QcTaauVEcTCFMLTKfcwGGZQnqnUx5cK4iGASYSWu5",
      "./package/package.json":"/ipfs/QmYwyn1a22x2LhkFyTSJSHS7BB387ZD5Ee27QNiEHreinE"
   },
   "dependencies":{

   },
   "devDependencies":{
      "ava":"0.0.3"
   },
   "peerDependencies":{

   },
   "optionalDependencies":{

   },
   "bundledDependencies":{

   },
   "signatures":{

   }
}

Client's that are not using IPFS could also request the content via http but using the IPFS hash:

https://registry.entropic.dev/v1/objects/object/ipfs/Qmc8YRiLFWpyH...KccjwKP

That url looks an awful lot like an IPFS gateway, they could even just run a gateway:

https://registry.entropic.dev/ipfs/Qmc8YRiLFWpyH...KccjwKP

Because the hashes aren't directly exposed to the user, in theory various instances of entropic could use different forms of content store and still provide the same public facing interface because each instance takes care of hashing it's own content, and doesn't share that directly, a dat:// implementation would look very similar, just instead of SHA256, it would store dat://778f8d955175c92e4ced...0352c457943666fe639 hashes.

I definitely think it's worth trying out an IPFS backed implementation and seeing if there's a good way to extract not just the object store interface but the content hashing one as well.

@andrew
Copy link
Collaborator Author

andrew commented Jun 6, 2019

I took a stab at adding IPFS support to Entropic, you can check out my branch over here: https://github.com/andrew/entropic/tree/ipfs

Screenshot 2019-06-06 at 13 26 20

Major registry changes are:

  • adds content to local IPFS node and returns CID
  • retrieves content from local node via CID
  • content hashes for version metadata and file contents are stored in the database as IPFS CIDs
  • uses ipfs-http-client and the go-ipfs docker image in docker-compose.yml for easy setup

I also added IPFS support to the client, again with ipfs-http-client so you need a local ipfs node running to use it, doesn't fall back to http or use a public gateway at the moment. Non-ipfs clients can still work with a registry that's using IPFS as a content store, requesting content via http: https://registry.entropic.dev/v1/objects/object/ipfs/Qmc8YRiLFWpyH...KccjwKP

The client always needs to make initial http requests to the registry to get the content hashes (doesn't work offline), but once it has the content hashes it requests them from the DHT so can pull from any registry that has the same modules cached, and any other nearby users too.

It stores the hashes in the "Subresource Integrity" form ipfs-Qmdfsd.. rather than the traditional /ipfs/Qmdfsd... to minimize the differences but I had to comment out some code that used the ssri library as it doesn't have support for ipfs as an algorithm.

Ideally we'd get ipfs support added directly into ssri, which would then make adding IPFS support to entropic much simpler.

There's a function to see if some content is already stored in IPFS via it's CID, I couldn't find a nice API to replicate that, the current work-around seems to be ipfs refs local | grep CID which is pretty nasty to me, so I just always return false for now.

Things I noted whilst implementing:

  • the ipfs-http-client node module has a lot of dependencies inc native ones, that doesn't make for an attractive pull request for what could be achieved with the built in http library (RFC: perf: use peer-id-lite js-ipfs-http-client#1007 may help reduce the size once merged).
  • IPFS has a lot of different options for running it, it's overwhelming to add support for every different approach (embedded js-ipfs, http client library to js/go-ipfs) and supporting all the different ways it can error feels unknowable at the moment, would be good to have a "golden path" that's recommended, well documented with examples and supported long term.
  • Entropic with IPFS support won’t share CIDs with npm-on-ipfs as it stores individual files on IPFS not tarballs

@momack2
Copy link
Contributor

momack2 commented Jun 14, 2019 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants