dat protocol WG (#28)

July 31 2019 via IRC (#datprotocol)

Agenda discussion

People

pfrazee
andrewosh
rangermauve
mafintosh
okdistribute
noffle
substack

Agenda

Review last meeting/action items

Talk about extensions and how they relate to the daemon and Beaker.
Talk about backwards compatibility with old hyperdrive
Share this protocol dev roadmap and make sure everybody's on the same page
adding an optional boolean ack field to HAVE messages holepunchto/hypercore#215 (comment)
work together to craft a governance model that we all like
hypercore performance when many (hundreds) of cores are being loaded and replicated. 0 quickly peek at this replication management proposal to integrate with the daemon / SDK.

Summary/Review of Previous Meeting

Meeting notes
Action Items

Meeting Notes

Updates, what's everyone been up to:
- rangermauve: I've been working on integrating the SDK into Dat store to see what was missing from the API and how it felt, Also I've been trying to figure out how to get corestores to work for stuff like Cabal
- pfrazee: I have been doing some hopefully final UI iterations on beaker, just trying to really nail the site development experience. basically iterating on that editor sidebar
- rangermauve: How goes the beaker-daemon integration?
- pfrazee: I havent touched it since I got the basics working; it's a todo for me to spend a day on that this week. super amped to get that wrapped, beaker blue is so close
- andrewosh: working on major improvements to fuse-native this week -- will make the current impl much better. we merged in the v10 branch yesterday, so that's all ready to play with (beaker will be a great test). and as for tests, currently halfway through a full wikipedia import
- mafintosh: New sodium will be released today So we can that sweet electron 5 and node 12 support
Protocol Roadmap
- pfrazee: we threw together a roadmap to help people get a sense of what andrewosh and mafintosh have been working on https://hackmd.io/VamhcsPjTcCvBgn4Sfr_0Q The whole "dat next" section is what's currently targeted to get done
- rangermauve: So, is this corestore replication stuff written down somewhere?
- andrewosh: You talking about the replication improvements in the "dat next" section? think we're still ironing out what that will look like (no dev progress on that since the last WG meeting)
Corestore
- pfrazee: rangermauve has been working on how the corestore API should work and talking to cabal and wireline devs about that, so yall should be sure to check in on that
- rangermauve: Yeah, here's what I've got so far from talking to people: https://gist.github.com/RangerMauve/de1004077fa90a2ed219b48c8a49939b Basically, it's the current corestore API, but with extension support and removing / deleting cores. At the bottom I also sketched up some ideas I had for a general purpose replication thing for hypercores.
- andrewosh: looks like great additions. a few of those bullet points won't be necessary once we add the cap system to the protocol too. what that's in your shouldn't have to use a default core to bootstrap the replication (define the encryption params). but let's have a bigger chat about all things corestore
Hypercore ACK
- holepunchto/hypercore#215
- rangermauve: Basically, substack is doing some cool stuff with making sure that data has been fully replicated to another peer and would like to add a new message type to the hypercore protocol to support it
- mafintosh: It’s a super important feature we def need
- substack: do you think adding an ack field to HAVEs would be an acceptable solution? I could also use ordinary HAVEs as in the original PR from emil, but I don't think there is a way to eliminate the false positives which could be a problem for people writing logic for bookkeeping ACKs
- mafintosh: a flag seems like the easiest solution to me. ie, the messages are semantically HAVE messages (i.e. the remote has the data) - the have.ack = true is a hint that this is an ack message
- substack: yep that's exactly what I was thinking too. ok I'll work on this today. hypercore-protocol/hypercore-protocol#37
Governence changes
- okdistribute: hey all, so this working group is great but over the next few months I'd like to establish a clear process (bylaws) for how to manage memberships. There are a lot of other technologies and protocols that take donations and make decisions and we can learn from them. I don't really think we need to discuss specifics now, but I'd like to encourage people to look at these other orgs over the next 6weeks and then come together mid September in a specific meeting around this topic. I plan to step down as "director" or whatever that means from the dat non profit entity and not appoint a new one, and my timeline looks like by January. so it would be nice to have these structures in place before the end of the year. Just a seed to plant.
- pfrazee: so this has 2 parts: moving decisions about dat.foundation funds to the WG, and setting clear rules for membership in the WG, correct?
- okdistribute: Yeah thanks for the clarification pfrazee. I propose we come up with a process about both of these things. but really it's to plant a seed so people can do some research on this on their own and come back with ideas
- pfrazee: so one thing I want to point out, WG meetings are open to non-members. Membership is about anything that has to be put to a vote, which includes things like DEP publishing and (soon) funds usage
- rangermauve: What sort of scale of funds can we expect the foundation to be dealing with?
- okdistribute: Budget in the past few years has ranged from 0 to 150k
- noffle: is this WG's membership/process documented somewhere?
- rangermauve: https://github.com/datprotocol/working-group/#wg-membership-process
- rangermauve: How has that been managed so far?
- okdistribute: Mathias Joe Danielle and I make those decisions currently. It was a decision made after the org ran out of funding and max left the internet. I think it's time we move from this emergency phase to a more stable and transparent process. Thanks for the questions. We don't have to reinvent the wheel so I urge everyone to look and learn from other protocol and technology foundations, like postgresql to learn how donations and company contributions are handled. postgresql, or linux, or node, etc etc. we can learn what to do and what not to do :) That's all from me, we can schedule a call around this in late September if people are willing.
- pfrazee: small aside while we're talking about org stuff, if anybody has any questions about Beaker Inc, feel free to PM or hit me up in one of the other channels later
Extension messages
- rangermauve: Talking to people that have been developing on top of hypercore has shown me that they're pretty important for getting stuff done. The hard thing is that we need to register extension message types with whatever is managing the swarm replication. Right now, in the SDK, I've been slowly growing the set of all possible extensions so that every time I open a hyperdrive or hypercore with extensions, I add them to the set, and use the set for establishing new connections. I was thinking that longer term extensions should somehow be persisted with the core so that they can be loaded up when the daemon or whatever loads up.
- noffle: sorry I'm not sure I understand the problem. wouldn't the application define the extensions? you'd need that app's logic to handle recv'ing those extensions anyways, right?
- rangermauve: Yeah! If you're embedding all the dat stuff in your application, it's not too big a deal. But for stuff like Beaker or the daemon, applications will likely interact with them at a higher level
- noffle: higher level how?
- rangermauve: Higher level in that you don't specify storage or invoke .replicate()
- rangermauve: Hmm, I guess it's not a big deal if the extensions aren't persisted actually
- mafintosh: Not sure I follow the trickyness as well :)
- pfrazee: the extensions have to be set prior to establishing a hypercore-protocol handshake
- noffle: oh, like, apps that want to use a very highlevel dat-ish api but want to use extensions at, like, the app level..?
- mafintosh: Only the upstream data structure knows which one to use no?
- rangermauve: yes!
- noffle: or reuse off-the-shelf extensions others have written?
- rangermauve: I want applications to use the extension APIs themselves
- pfrazee: and once the daemon (or beaker) get involved, the applications need a way to set the extensions, but they're not involved in setting up the handshakes so they're missing a chance to set the extensions
- mafintosh: we can prob easily make extensions configurable per channel instead of per connection if that helps solve it
- pfrazee: it might help but there will probably still be cases where applications get involved after a channel is established
- mafintosh: what would an example of that be? ie it would only get opened once the data strucure is loaded
- pfrazee: an example might be that beaker automatically swarms your archives at load time. so then the user opens an app that wants to use extensions on one of your owned archives, it's already missed its chance.
- mafintosh: i see, well we can prob make them configurable after channel open also. It’s not v complex
- rangermauve: Yeah, it might not be too big a deal actually. If you have some background process replicating a hyperdrive or hypercore, you don't need to worry about extensions
- noffle: is the intention to use hypercore extensions for ANY out of band communication, or for hypercore-specific bits?
- pfrazee: if you're not looking to reuse a hypercore-protocol connection, the PeerSocket API may be better
- RangerMauve: I think longer term we'll have some sort of Peer Sockets API, but I'm just trying to enable the current uses of extensions inside Beaker
- rangermauve: Corestore makes life a little easier in that we can say "Hey, discover peers for this key, and here's the list of feeds that you replicate with these peers". The extensions are there mostly so that two peers can tell each other about which keys they want to replicate over their cores. So extensions are only really relevant when the application is actively sharing new feeds, by default two peers with corestores can just replicate all the feeds in those corestores
- mafintosh: I use them v differently. I use extensions to “enhance” algorithms backed by hypercores. Extension messages are completely untrusted. fx, in hypertrie you can use an extension to prepopulate the cache. Since you should never trust an extension message cache population is a good use case. if the other person lied, no big deal, if not you get a hot cache
- rangermauve: Yeah, I think wireline have some sort of authentication layer on top of extensions that make them a little more safe to use.
- mafintosh: define safe :)
- rangermauve: Like, less likely to be able to "lie"
- okdistribute: heheh. But cabal also uses them right?
- noffle: yes for negotiating which feeds to replicate and add new ones at runtime. multifeed implements this
- rangermauve: I think in multifeed's case it also doesn't care about lying since it it's a free for all of everyone sharing feeds with each other
- noffle: lying can't really help anyone; the replication of the hypercore will just fail. but I did notice that if someone sends you a bad remote signature it'll shut down the whole replication channel ;)
- mafintosh: yep depends on the threat model etc. in hypertries extension you only ever wanna trust the trie source for example
- rangermauve: Yeah, whereas in stuff like multifeed you trust anyone that's got the initial feed key and there's no actual source. being able to negotiate extensions dynamically would be nice. What would need to change to enable that? Or, I think we could probably get far enough with experimenting with this before needing to have dynamic extensions
- mafintosh: you can make an extension to setup dynamic extensions
backwards compatibility
- rangermauve: I sketched some ideas up and was wondering if I could get some feedback: https://github.com/RangerMauve/hyperdrive-backwards
- pfrazee: so the trick with backwards compat is not just API surface. there are basically two issues we need to contend with: breaking changes to replication and loading the old hyperdrive stack
- rangermauve: Yup! I've got some ideas about that. It's pretty gross though. :P
- pfrazee: haha yeah that's usually what I run into
- rangermauve: basically, it uses JS proxies to pretend to be a hyperdrive, while delegating all API calls until the actual hyperdrive type is determined.
- mafintosh: sounds tricky
- pfrazee: previously when I discuss this with mafintosh and andrewosh, their thinking tends to be that let's get the hyperdrive10 stack stable and then look at what it's going to take. mafintosh has your thinking changed on that yet?
- RangerMauve: The first thing that happens is it initializes the metadata hypercore and tries to load the header. And from there it can determine the hyperdrive type and pass the metadata feed to the hyperdrive constructor for the right type
- mafintosh: Ie what about streams etc, are you gonna return a proxy when the user does drive.createReadStream also?
- RangerMauve: yes. I'm figuring out how to do that now anyways for getting the DatArchive API to be wrapped with the hyperdrive API
- okdistribute: what if we first encourage people upgrade their drives and see how many complain
- mafintosh: yea I’m a radical like karissa
- pfrazee: yeah there's a real good chance we might have to go that route
- okdistribute: maybe give a tool to do it nicely that throws away the old metadata and re calculates. Small script
- mafintosh: i think our efforts are much better spent elsewhere other than making migration tools
- pfrazee: it might be relatively simple to do. we can feel it out
- mafintosh: it’s not
- rangermauve: I just really don't want to throw the entire ecosystem under the bus. Especially for Beaker users. I'm already dreading having to upgrade my blog
- pfrazee: yeah I feel the same way
- mafintosh: untill a stastical sample complains this is a moo point. migration script all the way
- rangermauve: To that end, it might be best to not merge the discovery-swarm and hyperswarm networks
- pfrazee: I might be able to setup a web service that does the migration. so beaker users could go there, enter the URL, it fetches the old hyperdrive data and then creates a new drive for you using the local DatArchive api
- okdistribute: This is where the "dat 2.0" or "dat - faster better stronger" or "dat rhino" framing can be useful, which means update on both storage and networking
- pfrazee: agree, I've been thinking this might be a 2.0 moment
Many hypercore performance
- noffle: the public cabal has ~320 cores right now. they take about ~3 seconds to load on my laptop. wondering about possibilities for batch-loading of cores. also, adding ALL of those cores to a hypercore-protocol takes quite a while too. I don't have good details (yet) or numbers though. just wanted to put this on the radar, since maybe not many dat apps have that many users replicating together aside from cabal? I had them loading in sequence before, which was much longer now it's parallel
- mafintosh: try beching how long it takes to load 1500 files in parallel in your laptop
- noffle: adding a new replication peer is so slow that it freezes the cli app for a few seconds
- rangermauve: It might be good to see some profiler results to see where most of the time is spent
- noffle: more investigation is needed, but I wanted to put this out there, since eventually there will DEF be multi-user dats with hundreds/thousands of feeds
- mafintosh: can you go into more detail on the replication peer? ie it freeses when a new peer is discovered?
- noffle: not sure, but somewhere in the start-syncing-to-new-peer it gets heavy on the main thread
- mafintosh: most likely bitfield compression which can be massively optimised
- noffle: combined with the remote sig disconnect issue I mentioned, it means peers are churning & lots of reconnects + lots of main thread blocking. cabal-cli is almost unusable on my spinning disk laptop.
- mafintosh: is the 3s bench from a spinning disk?
- noffle: yes. 3-5s
- mafintosh: yea def the file loading. try benching that when you have a chance
- noffle: by 'benching' you mean just console.time'ing loading the feeds?
- mafintosh: console.time loading 1500 files in parallel Using fs.open

ACTION ITEMS

We should think about figuring out how we want to govern ourselves and what ideas we like and dislike from other groups
We'll keep going with extensions and figure out dynamic stuff when we get to it
Noffle will check out performance on their laptop and we'll see if there's some low hanging fruit for perf gains for lots of hypercores
mafintosh and substack should get together to figure out this ACK stuff in detail
next meeting is at 17:00 GMT on the 14th

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

28-31July2019.md

28-31July2019.md

dat protocol WG (#28)

People

Agenda

Summary/Review of Previous Meeting

Meeting Notes

ACTION ITEMS

Files

28-31July2019.md

Latest commit

History

28-31July2019.md

File metadata and controls

dat protocol WG (#28)

People

Agenda

Summary/Review of Previous Meeting

Meeting Notes

ACTION ITEMS