-
Notifications
You must be signed in to change notification settings - Fork 11
Entropic #64
Comments
I took a stab at adding IPFS support to Entropic, you can check out my branch over here: https://github.com/andrew/entropic/tree/ipfs Major registry changes are:
I also added IPFS support to the client, again with The client always needs to make initial http requests to the registry to get the content hashes (doesn't work offline), but once it has the content hashes it requests them from the DHT so can pull from any registry that has the same modules cached, and any other nearby users too. It stores the hashes in the "Subresource Integrity" form Ideally we'd get ipfs support added directly into There's a function to see if some content is already stored in IPFS via it's CID, I couldn't find a nice API to replicate that, the current work-around seems to be Things I noted whilst implementing:
|
Thanks Andrew! I think worth writing up specific issues for ssri support
and http client dependencies.
I believe the http Api to a go-ipfs node is supposed to be our golden path
- but agree it still needs lots of documentation help and additional
complexity reducing within that option (aka smart default config options).
Your expertise designing the optimal path through this for most package
manager users would be awesome!
…On Thu, Jun 6, 2019 at 7:30 AM Andrew Nesbitt ***@***.***> wrote:
I took a stab at adding IPFS support
<andrew/entropic@8469b5e>
to Entropic, you can check out my branch over here:
https://github.com/andrew/entropic/tree/ipfs
[image: Screenshot 2019-06-06 at 13 26 20]
<https://user-images.githubusercontent.com/1060/59039472-d3a28600-886c-11e9-942c-c19356986314.png>
Major registry changes are:
- adds content to local IPFS node and returns CID
- retrieves content from local node via CID
- content hashes for version metadata and file contents are stored in
the database as IPFS CIDs
- uses ipfs-http-client and the go-ipfs docker image in
docker-compose.yml for easy setup
I also added IPFS support to the client, again with ipfs-http-client so
you need a local ipfs node running to use it, doesn't fall back to http or
use a public gateway at the moment. Non-ipfs clients can still work with a
registry that's using IPFS as a content store, requesting content via http:
https://registry.entropic.dev/v1/objects/object/ipfs/Qmc8YRiLFWpyH...KccjwKP
The client always needs to make initial http requests to the registry to
get the content hashes (doesn't work offline), but once it has the content
hashes it requests them from the DHT so can pull from any registry that has
the same modules cached, and any other nearby users too.
It stores the hashes in the "Subresource Integrity" form ipfs-Qmdfsd..
rather than the traditional /ipfs/Qmdfsd... to minimize the differences
but I had to comment out some code that used the ssri
<https://github.com/zkat/ssri> library as it doesn't have support for ipfs
as an algorithm.
Ideally we'd get ipfs support added directly into ssri, which would then
make adding IPFS support to entropic much simpler.
There's a function to see if some content is already stored in IPFS via
it's CID, I couldn't find a nice API to replicate that, the current
work-around seems to be ipfs refs local | grep CID which is pretty nasty
to me, so I just always return false for now.
Things I noted whilst implementing:
- the ipfs-http-client <https://github.com/ipfs/js-ipfs-http-client>
node module has a lot of dependencies inc native ones, that doesn't make
for an attractive pull request for what could be achieved with the built in
http library (ipfs-inactive/js-ipfs-http-client#1007
<ipfs-inactive/js-ipfs-http-client#1007> may help
reduce the size once merged).
- IPFS has a lot of different options for running it, it's
overwhelming to add support for every different approach (embedded js-ipfs,
http client library to js/go-ipfs) and supporting all the different ways it
can error feels unknowable at the moment, would be good to have a "golden
path" that's recommended, well documented with examples and supported long
term.
- Entropic with IPFS support won’t share CIDs with npm-on-ipfs as it
stores individual files on IPFS not tarballs
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#64>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEXAF475TT3R426OUI3EC3PZENOVANCNFSM4HS5AZFA>
.
|
Only just announced this weekend as jsconf, Entropic is a new package manager registry and client intended to be a federated replacement for npm.
Video: https://www.youtube.com/watch?v=MO8hZlgK5zc
slides: https://speakerdeck.com/ceejbot/the-economics-of-package-management
Very exciting, I can see a world in which each large company publishes their modules to their own entropic instance and take on the cost of hosting it, [email protected]/redux for example.
Along with the GitHub Package Registry we're going to see people start to depend on more than just one registry to resolve their dependency trees which brings a number of new challenges to many of the communities that have traditionally relied on a single source of pacakges.
There's little documentation on how the syncing between instances will work yet so I'll keep my eye open for that because, as I outlined in #52, as the number of registries required to be available increases, so does the likelihood of availability issues (downtime, deleted packages etc), so having each instance keep a copy of all the dependencies it needs will be important.
Some initial observations, although please bare in mind that the project is only a month old and may change considerably over time.
Federation
Entropic package names include the domain name of the registry and a namespace from that instance as well:
[email protected]/ds
Currently if a delcared package doesn't confirm to that format then it falls back to proxying packages from npmjs.org, which are then saved locally within that instance under a
legacy
namespace:[email protected]/user-home
n.b. namespaces and package names are not case insensitive at the moment, could cause some typosquatting problems on popular, open instances
Usernames are currently based on GitHub oauth but team names that share the same namespace can be anything, so namespaces may be similar across instances but can't be guaranteed to be globally unique, which is why the domain name is required.
There's currently no sync between instances, that's pushed to the client at the moment.
It doesn't look like an Entropic instance will attempt to eagerly cache dependencies of packages, instead lazily caching packages as requested, that is to say, the server does not attempt to do any dependency resolution, that's left entirely to the client.
All packages published to Entropic are public, access control needs to be done in front of the instance, i.e. put something in front of it to make it private (firewall, nginx etc)
Content addressable storage
Internally both the registry and the client use content addressing for storing version and file data, although the content hashes aren't exposed to the user, once lockfile support is added to the client I'd expect to see hashes in there.
Object storage uses hashes (sha512 by default) for individual files: https://github.com/entropic-dev/entropic/blob/master/registry/lib/object-storage.js
When a package is proxied from npmjs.org, the tarball is unwrapped and each individual file written into the object store for each version, there doesn't appear to be any compression happening when storing contents.
n.b. possibly vulnerable to zip bomb attacks?
Multihashes aren't used but the hashing algorithm is stored, hashes are passed to the client following the Subresource Integrity format:
sha512-nDoyy...ekwZ3lc4g==
The JSON API has some similarities with IPLD, for example listing available packages on an instances:
Where each version number points to a hash, which is then looked up from the object API:
https://registry.entropic.dev/v1/objects/object/sha512/zbIl1aIaw2...lJx6XQ==
The hash for each version is itself JSON, with a list of file paths and the hashes of their contents, plus various groups of dependencies:
n.b there's fields in the database for storing signatures but no support for them in the client yet
Client
The client
ds
introduces new manifest formats and doesn't support npm'spackage.json
orpackage-lock.json
, instead going for Toml files calledPackage.toml
andPackage.lock
(although the later isn't implemented yet), which ends up looking a lot of likeCargo.toml
from Rust.Package.toml example:
Can this work with IPFS?
There's actually already an issue open to discuss, although the performance problems highlighted in @achingbrain's patch to pacote don't look great.
Talking of pacote, it's used by entropic internally for proxying data and packages from npmjs.org, so if that PR ever got merged, entropic would have some IPFS support but not in a meaningful way.
It looks like there's plans to extract the object store so different adapters could be used, s3 for example, but that will likely be a straight key-value store. I believe the only way to use IPFS as a key value store is with an MFS directory, but then the client would need to be kept up to date on the root of that MFS directory which would be changing very rapidly (every time a new file is written)
n.b. the docs page for MFS is really bad: https://docs.ipfs.io/guides/concepts/mfs/
To really take advantage of IPFS, instead of storing straight SHA256 hashes, using multihashes in IPFS to hash and store the content would enable:
The source of each version hash would still originate from the instances postgres database, requesting metadata for a package by name and host would still be done via http but the result would include IPFS CIDs instead of just the SHA256 of the metadata for each version:
n.b. this stage requires the client to be able to reach the registry, i.e. online and avaiable over http, unless it has a cache of the response from a previously successful request
The client can then fetch the CID for each version from the IPFS swarm of registries and clients, which would have a list of files, dependencies and signatures (looks a lot like IPLD):
Client's that are not using IPFS could also request the content via http but using the IPFS hash:
https://registry.entropic.dev/v1/objects/object/ipfs/Qmc8YRiLFWpyH...KccjwKP
That url looks an awful lot like an IPFS gateway, they could even just run a gateway:
https://registry.entropic.dev/ipfs/Qmc8YRiLFWpyH...KccjwKP
Because the hashes aren't directly exposed to the user, in theory various instances of entropic could use different forms of content store and still provide the same public facing interface because each instance takes care of hashing it's own content, and doesn't share that directly, a dat:// implementation would look very similar, just instead of SHA256, it would store
dat://778f8d955175c92e4ced...0352c457943666fe639
hashes.I definitely think it's worth trying out an IPFS backed implementation and seeing if there's a good way to extract not just the object store interface but the content hashing one as well.
The text was updated successfully, but these errors were encountered: