-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scaling to indefinite and changing groups of files #58
Comments
Well let's see... The wire protocol exchanges hypercore blocks, not files. As a result, peers don't explain what files they're trying to fetch when they communicate with each other. They only explain which data-chunks they want. That doesn't help. One option might be to update the hyperdrive data structure so that it can have "placeholder" entries. These would be empty files with a "is_placeholder" flag set. A wire-protocol extension message could ask the owning peer to "hydrate" the file, in which case it would overwrite the placeholder entry with a real entry. Then replication could happen as usual. Another option might be to leverage one-way mounts, once they're implemented. In that scenario, the owning peer would be sharding their file-set into a tree of dats. They would optimistically create and share the dats, but not populate them -- when a peer first connects, they would then populate the dats. Let me step through the one-way mount solution. Let's say we have the following file tree:
The top level has three folders:
At initialization, a top-level dat would be created, and then 3 more dats: one for each folder. They would remain empty. The top-level dat would mount each of them.
All 4 dats are swarmed. Let's suppose a reading client wants to pull down the 'bob' folder. It would request a block from the 'bob' hyperdrive. When this happens, the owning peer would immediately mint 2 more dats, the 'bob-fun' dat and the 'bob-work' dat, and mount them to the bob dat:
These 2 new dats would then also be swarmed. This continues recursively. Basically, we're using entire dats as placeholders, and using the request of any block as a trigger to "hydrate" it. The good news is that one-way mounts will not require additional hits to the DHT. The replication connection for the top-level dat can multiplex all of the child mounts, making this design relatively efficient over the network. It also would not require any unplanned changes to the protocol. |
This is now theory land: Another way to go about this would be more "lowlevel". Every DAT consists of 2 hypercores, metadata & content. What if it were to contain X hypercores? One per folder and one per file. The special hypercore For a file, it might contain an entry { "op": "add", "file": "Readme.md", "time":12312313, "hash": "YI8Qx5b/Tpbu5Hiw72RoSduT3qL3ZgYWZEMx5QwTPZ4=", "type": "file" } ... which means that for the latest version of the For a folder, it might contain en entry: { "op": "add", "file": "css", "time":12312313, "hash": "85FydCC60+APHJsgWHJBOISXnITmtx9dwRUC4wi+pak=", "type": "dir" } ... which means that folder the history of this folder you would need to ask the server for the hypercore: The server would need to lookup all files in a folder and just generate the hashes and upon request of a hypercore would start creating those hypercores (like a stream placeholder). The crux of the issue, and the reason why the webarchive can not really do what they mean to is the problem that you can't quite know how big the entire archive is (something that is prominently displayed in DAT right now). |
@martinheidegger Correct me if I'm wrong, but I think, for the most part, that's how my one-way-mounts solution works. |
@pfrazee It is similar, but on a lower level. Hyperdrive → Hypercore. Also it uses hashes instead of direct file lookups for deletions. |
@martinheidegger What's the upside of introducing a different hypercore datastructure than hyperdrive? If you stick with hyperdrive, you avoid having to create a niche solution. |
Codename for this thing: |
Thanks for the thinking .. I'm hoping we can use this exercise as an example of how DAT might work with a large and changing site like the Archive. IMHO (and from prior experience with how the Web built on legacy Gopher and FTP data), how well a protocol gets adopted depends in part on how well it integrates with legacy data systems, and its great to be able ot use the Archive to test out some of these ideas. I could be wrong, but I don't think either of those scenarios work for large data sets with an unknown (and changeable) set of items. For example, the obvious structure to me would be dat://// , in @martinheidegger 's suggestion this has two issues Of course - we could do this in a different way - similar to how we work around IPFS's inherent weaknesses - where we created a DAT for each item the first time a user requested it, and then shared the address of the DAT out of band (via GUN or HTTP), but with that solution, Wort, Gun or WebTorrent are always going to work better than DAT. |
@martinheidegger You're free to build any data structure you want on hypercore, but you're going to have to get that structure adopted widely if you want it solve the problem. @mitra42 Sharding the dataset into multiple mounted dats is going to give you a couple benefits:
Are you certain that sharding the dataset to one-dat-per-folder, and only updating folder-dats that have peers, is not enough for you to scale? |
I wonder how we might use something like kappa-db/multifeed for this 'hyperfs' implementation.. any thoughts @noffle? |
@pfrazee That issue persists with both a hyperdrive and a hypercore solution. Pulling it one abstraction level higher doesn't make it quicker to be used. @mitra42 I am seconding @pfrazee here: but I have some issues comprehending all you wrote (some formatting issues here). The only bottle-neck that I see with my example is that |
@pfrazee i disagree that it has to be adopted widely for the archive's case, because most of the peers would be served from the client side webrtc (watching the same video at the same time, for example) |
@Karissa you're just losing wider ecosystem compatibility. If the situation really is niche, then you can do that, but I think you're losing some of the benefits of dat. In the case of cabal the custom data structure makes total sense, but in this case hyperdrive + the upcoming one-way-mounts feature has a good chance to solve the issue. |
Yeah, this is why I was sort of hoping it could be something at the hypercore level -- being able to expose the filename (if known, optionally) in the request |
or rather, any payload for extra request content needed |
@Karissa You could probably do that via a wire-protocol extension message. It's got some extra overhead but if it was only turned on for dats doing this style of on-demand hydration, it's not super crazy. You'll just need the peers to support the extension. |
@pfrazee I don't think we can come up with a solution that doesn't require the entire ecosystem to update. I believe a case could be made that the next update multiwritercough* could be that thing. But also once more these solutions are not backwards compatible! One important aspect of DAT is that we know the entire size of it at a versions time. No matter if it's mounts or different cores: we wouldn't be able to tell how big the data space is. |
@martinheidegger The one-way mounts are a planned update for the entire ecosystem, though. |
Planned doesn't mean done ;) |
@pfrazee asked "Are you certain that sharding the dataset to one-dat-per-folder, and only updating folder-dats that have peers, is not enough for you to scale?” Sure - this can be done, but then you have the issue that you can’t do this on 50m directories ahead of time, so you have to do this creation of one dat-per-folder between the time the user navigates to the page and the time the metadata is returned, since you might have to rebuild a DAT with a few GB of data (the raw version of the file), so it’s very slow on first access to an item, and in fact slows it down for all users, not just those who will use DAT to retrieve the files, so in practice it won’t happen. (For IPFS we only put the actual files requested into IPFS because this step is so slow for IPFS). @martinheidegger: You are definitely going to lose the “size of the entire DAT” for any large changing data set, accumulating size back up the chain isn’t practical, Updating the top-level “/“ DAT is possible but much harder since a) I can’t crawl 50m items and b) even if I could, doing that update at item creation time requires hacking a large code-base with substantial (and reasonable) institutional resistance to change, while anything I can do on top of the existing data doesn’t face that barrier and is more open to experimentation. I truly believe that name lookup on large (number of items) changing data sets is a crucial bottleneck, its a massive weakness in IPFS, and a big advantage in GUN and Wort. Both AT an Wort allow for “hijacking” during that name process, i.e. mapping a name from their ecosystem to a HTTP request. I think it would be great for DAT if it could think this through., for the Archive we can use GUN/Wort/HTTP to discover the DAT archive id, but that wouldn’t be was useful for testing DAT on its own scaling. |
@mitra42 If that's your read then I'd agree that you should try the wire-protocol extension. |
That is not what we are suggesting here. In pfrazee's approach you'd only serve a DAT once you have it but link it ahead of time. My approach would do the similar thing on hyperprotocol level (if the protocol asks for a key, a hook will be called to create this item - or return 0 if its not replicated/existing).
This sounds to me like a more-important issue than you might think. The size allows prediction on the client how much needs to be downloaded and gives an indication of how much would need to be shared. (also you gain some optimization options by preserving this). @mitra42 Asking a different question here: Why, one data set and not split it up and serve a lot of smaller DATs? A DAT - particularly in the next version - can easily hold few gigabytes of small files and perform well. Without breaking any infrastructure: If a central indexing DAT contains a list of DATs for sections of the web, then clicking on a link (in beaker browser) the service could index the content of that section.
I havn't seen the DEP of this yet. What was the consensus on the "loose size of entire DAT" issue (which persists with mounts, from a user perspective). Thinking about a protocol-level approach here:In the protocol there is no distinction between files and other data. The mirror-folder system indexes the file system and adds one-by one files to the DAT. There is currently no metadata that tells the client if a mirror-folder process is running on a folder (same like there is no information about file streaming). If this information was given, a reader could let the writer know "please prioritize this folder". And the writer could simply tell the mirror-folder process to prioritize that folder first. This seems like a hack of a system though - very fragile. Note: I have other tasks to look after from now until tomorrow. Will answer/give ideas with delay. |
@martinheidegger - there are two possible approaches here - one DAT for the whole of the IA, or one DAT per item with a root-level DAT as an inde, both of them appear to have (solvable or unsolvable) scaling problems. For the one DAT per item approach maybe I'm missing something - where do I find the link for a specific DAT, given that I can't run a process on 50m items to create the links for 50m (not yet created) DATs ahead of time in order to put these into the index , the scaling problem has just been pushed to a different (maybe better) point. If we could calculate the pointers for the DATs without processing the content (e.g. by a hash function on the name) then it would be easier, and wouldnt create a slowdown at the point we are requesting Archive (not DAT) metadata, though its still non-trivial to generate a list of all item ids, and even more non-trivial to map those DAT ids back to not-yet-created DAT archives. If its one DAT per item, then certainly we can get sizes on each DAT, but obviously not on the index which will change size approximately once per second as items are created .... how in DAT are you now handling data sets which get appended to and so don't know the size ahead of time? |
I second the idea of using a protocol extension, but I don't think you need to use mounts. Think of it like this:
This way you don't need to index ahead of time, and content is lazy-loaded into the archive. And once the file is loaded, it'll be spread across the P2P network as needed. Some concerns:
I believe this method would be similar to what the internet archvie is doing for other protocols and requires a minimal amount of change to the existing hyperdrive code. |
I think this is the right kind of approach, A couple of detail points/questions. Changes: wouldn't this work the same as DAT does now, i.e. if a change happens after a file is added then presumably the same mechanism (and I don't know how DAT does this) to handle an update would be used. Connectivity: Could some variant of the mechanism (again I don't know how DAT does this) that is used to find which peers have the metadata/file now be used to forward the query (with the file name extension) with the peer connected to the file system eventually getting the request and loading it. Http: Instead of connected to a File system, the peer should map the DAT to a URL request e.g. DAT/1234// maps to http://dweb.me/arc/archive.org/download//. (for a filesystem this would just be a URL file:/something/something) Even better: Note that this could (in some cases) also work at multiple peers not physically at the Archive, where any peer could fetch a missing file via HTTP, this is how WebTorrent does it now - there is a HTTP field in the magnet link that allows any peer to get missing files direct from the Archive and start seeding them itself. (This is wonderfully fast in WebTorrent, which is why it is currently our preferred decentralized video player) |
Note that for the anti-censorship case, you wouldnt want to rely on the HTTP fallback direct from each peer, as ideally if you are behind a censorwall you want to be connected to some peer that is able to connect via Http. |
Changes: It depends on how the raw data is stored. If possible you could hook into FS events and check if the file is in the metadata feed and add the change Connectivity: Kind of. Basically, you discover peers for the archive and open up a HTTP: Not sure what you mean, is the data not going to be stored directly in the archive? Even Better: Yeah, I get what you mean. However, peers are unable to add data into a Dat archive, only the internet archive could add files to the archive due to the way security works in Dat. Now that I think of it, I don't think it'd be very easy because Dat doesn't use content addressing the way WebTorrent and IPFS do. |
Changes: Might have to do that later, probably from the other direction, having something periodically crawl the archive comparing sha's against that in the current metadata. |
In the hypercore protocol you need to specify ranges of interest. If the range doesn't have a clearly defined end the client will receive always the newest available data. (add-only-protocol)
Well, one thing that has been part of this discussion is "mounts":
(source: datproject/planning) This means that for IA you would have one DAT that contains the links to all other DAT's and you would add one for every folder you have. As soon as someone becomes a peer to a particular DAT (and asking for additional data) you could start/resume the mirroring of that particular DAT. |
Lets be precise .... when you say "contains the links to all other DAT's and you would add one for every folder you have." Do you mean all the folders you have, or all the ones you make. This is important, because as said above you simply can't create one file with links to all the possible (50 million and changing all the time) folders/DATs. |
@mitra it is stack-able. You could a root-set of 512 keys, each containing 512 more, each with 512 in them, which would mean 3 more look-ups for getting to the actual folder. |
@martinheidegger the size isn't the point, its the impracticality of walking a tree this size. Its extending the same requirement that applies for files to the items (directories) that there is a need to request them, so that the superpeer can only (on first request) add them to DAT |
@mitra42 Lets assume we have 256 characters x 2:
|
Let me run through this ... First ... item ids are not case sensitive, and include a-z0-9 and some punctuation, so lets assume approx 64 chars not 256, so 64^2 = 4000 pairs per level.
This means for each item requested, its going to create ~4000 + 3 peers. Which means this isn't just a hack, its got a really nasty overhead, when what we really want to do is indicate from the client to IA precisely which item its looking for and ONLY create a dat for that item. |
Would a better hack be something like (with the protocol extension proposed somewhere near the top of this thread). This would be far less hacky, and also not have the massive overhead that I think (cant be sure) is present in the proposal you made. |
The problem/good-thing about having an IA DAT link is that you can replicate the entire IA. In order for this to not kill your hardware, you need to limit/prioritize what part of the IA archive is added in what order. On another hand: The whole file-tree of IA itself is also probably too big to be kept on your average hard drive; which means that each client connecting to IA-DAT would only connect/request data from a DAT that it wants to access: Which can be implemented with 1 connection at a time, or limit to (say) 5 connections - depends on the client software (Beaker?) to implement this. |
Sorry, I don't see how that comment relates to either the problem, or potential solution. The whole point is to limit what is added BASED ON WHAT THE CLIENTS REQUEST, and then only replicate to the client the parts it wants. Then on connections - its not the per-client connections, its that your proposal would - if I understood it - mean creating all these 4000 peers even when there is only 1 connection. |
@mitra42 1 client wants to get the DAT for This doesn't mean that a DAT client wouldn't be able to simply open 4000 connections to the peer for each DAT. It also doesn't mean that the same server would need to provide all those DAT's; it could easily be split up into several servers to index the data. |
The problem with this model is that there can (and should be in future) proxies for DAT's. Those proxies will serve a DAT as a peer to other peers. They will never be connected to the IA server! As such the proxies would need to aggregate and relay the messages of the clients to IA server. Effectively making it a lot harder to distribute the data. |
Right - but if I understood the conversation above those messages can also be relayed. Note that again, if I understood it correctly, the messages would only get relayed to IA (as they are in GUN) when that data is not already in the DAT, which is not really any different in terms of working through proxies than your solution. i.e. data is served from a Proxy if that Proxy has previously seen the data, otherwise the request ends up back at the IA server. |
@mitra42 Implementing a message-array is possible - in a form that all connected peers relay all the messages in the whole network but it is a unprecedented step.
For that to work the proxies would need to become a lot more "intelligent" than they are now. Proxies do not currently need to have a concept for "file-system", which means they can be used for any kind of data in future. For a message-relay-protocol to be added a-top of that I am worried about unpredictable amount of data to be sent through the system. How would the messages need to be constructed in a way to prevent system overload? Limit per peer / time? Limit of size ? Should messages may have a age limit? Message consume-ability? Will all messages just be sent? Should peers have a way to tell others that they received a message? Namespace for message-types? |
@Karissa can you jump in here, I though I understood that the extended messages would essentially pass through the exact same routing as a request for a block, but they'd be asking for a file so the load would be the same - am I missign something. |
@martinheidegger those are valid concerns but perhaps would come up in the development process, and aren't necessarily blockers for this work. @mitra42 I will discuss with @mafintosh this week about the proposed plan you and @pfrazee had, and get back to you about viability |
A few more questions/complexity: How do we prevent spamming the system? How is order ensured (very hard in a distributed system)? Are the senders of messages verified? (This would make proxying harder) It is my worry that messaging opens the gate towards tracking and user profiling. Something that I would very much like to avoid. See #49 Imo. If the IA case could be done without messaging, that would be awesome. Maybe @aral could add his POV? |
@martinheidegger I’ve mostly skimmed this (rather long thread) so apologies if I am not 100% up to date but some thoughts/concerns: If by messaging you mean an ephemeral messaging channel, this is something that I am exploring also for Hypha based on @pfrazee’s DEP-0000 for peer authorisation. You can see it in action (without encryption, right now, going to look into adding it today/tomorrow) at https://source.ind.ie/hypha/spikes/multiwriter-2 So an emphemeral messaging channel is essential for usable peer/device authentication in multiwriter so if one is being implemented at core, I’d like to see this use case covered. My concern regarding optimising Dat for IA is this: it’s like optimising Mastodon for Stephen Fry. We all love Stephen Fry but he has very different needs (tens of millions of followers, etc., than do regular folks on Mastodon.) Can Mastodon be both a great solution for someone talking one-way to millions of people and for people who want to chat among themselves in small groups? More precisely, and I’ve seen this multiple times now with various technologies, starting with (I’m ashamed to say), Flash: there is a chasm between the requirements of people/individuals and enterprises. A single tech is rarely great at making both of those groups happy. And, usually, given that the money comes from enterprises, tech evolves to meet their needs at the cost of not exploring optimisations that better meet the needs of individuals. What’s special for me about the Dat ecosystem is that it is focussed on the needs of individuals. I hope we continue to keep the focus there and aren’t sidetracked by optimising for enterprise, especially at the cost of optimising for individuals. (Not that I’m saying that’s what’s happening here but that is what it made me think of and I’d be remiss to not share my concerns on a project that has given me so much hope and that I care so deeply about.) These are just general concerns. Now that corporate funding is also starting to enter the picture, I feel it might be timely to consider them. Please use/expand upon them/or keep them at the backs of your heads if you feel they’re useful or don’t afford them another thought if you feel they are irrelevant :) PS. The one thing I would not like to see is the emergence of any privileged, centralised nodes. If anything, any centralised nodes that are absolutely unavoidable should be less privileged. (In Hypha, for example, there is an always-on node but it doesn’t hold any secret keys.) |
Totally valid concerns @aral - I see this as an exploration, can DAT be used for bigger systems and is it appropriate, I watched how the early days of gopher were driven by being able to integrate data from the legacy ftp, and then how the web's early growth was driven by being able to integrate data from gopher and ftp, and I think an ability to integrate with larger, sparse, file systems is going to be crucial to the various Dweb protocols (DAT, IPFS, etc) being able to grow and become the norm that eventually obsoletes the legacy centralized systems. I think the proposal we are exploring, is to figure out the general case of how to bring a file into DAT that isn't already there, preferably without having to go to some other protocol to request it. That is generically useful in all kinds of situations (including just mounting existing file systems onto DAT without duplicating all the data). The Archive, in this case is a good place to experiment, especially since we can explore, and if it doesn't work we can continue to use IPFS, GUN, WebTorrent or even HTTP. The archive is a classic large system in that it has a massively long tail - I used to have the number handy, but it was something like 10 items that had been downloaded over a million times, and 10 million items downloaded only once, don't quote me on that, I could be an order of magnitude out, but I think you get the idea.) Such a long tail creates a potential to replicate - on demand - just the parts of IA that are requested, this then becomes useful for example in the rural areas of developing countries, or behind censor-walls, where local decentralized replication of material pulled from the legacy web is a useful application. Sure we can do it with HTTP, but its also a great potential use-case for DAT (or IPFS, GUN & WebTorrent). |
@aral I am very hesitant to support (even ephemeral) messages - as they might be affecting the quality of the DAT network negatively. See my questions above two comments for reference. It is my worry that the mere possibility of a two-way channel has terrible effects on the trust-worthyness of a DAT source. It is to me a fundamental property of DAT that everything is rather fixed. I am looking at this from the perspective of someone who wants to transfer very-private-data. If that person connects to the DAT network, that person wants to be sure that all the data transferred contains as little private information as necessary. With an extendable messaging protocol attached, the clients (and servers) could exchange - over time - more and more information with each other. Enabling easier tracking of the person (eventually to the point that cookie policies might be in order?). You are having a usability issue here. You want to make the authentication process smoother. Which is a admirable goal, but I wonder if QR codes and Camera's (as used by Google Authenticator, Signal, WhatsApp, ...) might not be a better solution for the authentication?
Even though you might be hesitant to look at my proposal: I ask you to take another look. I believe it should easily possible to implement your use-case with the current state of DAT. |
@mitra42 Thanks, Mitra. I can definitely see the value in on-demand replication (and perhaps rather naïvely thought sparse replication was able to handle this). It’s also a feature I need for Hypha as huge upfront replication (HUR?) is a usability nightmare that can kill an alternative system dead at hello. (Can you image where the web would be if we had to download a whole site before we could get started… and yet some p2p implementations feel that this is an acceptable onboarding experience.) Anyway, I don’t want to digress or fork this thread. Will continue to monitor it silently + best of luck with your efforts :) |
@martinheidegger I understand your concerns; they’re definitely valid. Re: the use of ephemeral messages in Hypha: my plan is currently to use them only between nodes that the same person owns and only using symmetric encryption. I do not have any use cases beyond authentication at the moment. And I’m happy with it as currently implemented (as a protocol extension). If we want truly ephemeral messaging between people as a feature later (this is a valid social use case), we can use asymmetrically-encrypted ephemeral messages without any changes to the current protocol extension. My goal is to match (if not exceed) the usability of centralised systems, otherwise (for Hypha), we’re dead in the water from the word go. QR-Codes are a good possible secondary means of distributing the uncensorable read key especially in physical environments. But the addition of any extra complexity at onboarding is a showstopper for what I want to try and achieve (which I realise is not the same as everyone else’s use cases/success criteria.) |
Hey all,
I've been in conversations with Internet Archive to integrate Dat into their dweb portal. One of their requirements is to serve data from Dat over the peer-to-peer network without having to scan all of the files ahead of time (upwards of 50 petabytes of archives...). Opening this on behalf of @mitra42 who is project lead on the Archive's dweb portal.
The way they have approached this with other protocols such as IPFS, Gun, and Webtorrent is to have a peer that pretends that it contains everything, and then once an archive is requested, to go fetch it from Internet Archive storage and serve it on the network.
For example, they want to be able to handle requests for the string
/foo/bar/baz.mp3
over the network. Quoting @mitra42:Wondering if this is a use case that this WG think is interested to support directly in the protocol (e.g., file-aware transfers vs. only block transfers)?
Thanks,
The text was updated successfully, but these errors were encountered: