create the bootstrap spec #6

neonphog · 2024-11-15T22:54:20Z

Create a spec for the new bootstrap server API. All feedback welcome. Pay attention to the transition from //! to /// comments ; )

crates/boot_srv/src/lib.rs

mattyg · 2024-11-16T04:31:10Z

crates/boot_srv/src/lib.rs

+///
+/// ##### 2.4. Health check.
+///
+/// A `GET` on `/`.


Maybe we should use a prefix /health for the api and leave / available for an fancy health monitor html page?

Done: 5e4fc87

mattyg · 2024-11-16T04:54:13Z

crates/boot_srv/src/lib.rs

+/// - The server MUST reject the request if the body is > 1024 bytes.
+/// - The server MUST reject the request if `createdAt` is not within
+///   3 minutes in either direction of the server time.
+/// - The server MUST reject the request if `expiresAt` is in the past.


Must reject if expiresAt <= createdAt

Done: a2338e6

mattyg · 2024-11-16T05:00:26Z

crates/boot_srv/src/lib.rs

+/// - The server MUST reject the request if `signature` is invalid vs
+///   the `agentInfo`.
+/// - If `isTombstone` is `true`, the server MUST delete any existing
+///   info being held.


Must delete all AgentInfo records that meet all these criteria:

The 'space' field matches Base64Space from the request URL

The 'agent' field matches Base64Agent from the request URL

The 'createdAt' timestamp is older than or equal to the timestamp in the request body"

This doesn't seem right to me. A tombstone to me is a marker that something has been deleted. So I'd expect people querying the bootstrap server to see this agent as having removed themselves. If we just want ourselves gone, that should be DELETE verb?

See #6 (comment)

I know it's not working as a tombstone on the bootstrap server... but it is working as a tombstone within the kitsune peer gossip, so let's just reuse that already written functionality.

mattyg · 2024-11-16T05:04:03Z

crates/boot_srv/src/lib.rs

+///   the `agentInfo`.
+/// - If `isTombstone` is `true`, the server MUST delete any existing
+///   info being held.
+/// - If `isTombstone` is `false`, the server MAY begin storing the info.


I'd find it clearer to split create and delete into different endpoints. Then we could also drop the expiresAt field for delete requests.

The tombstone field is require for the functionality on the kitsune side. Since in the active network they cannot just delete an agent info--they might get the old one gossiped back to them again, so they need tombstone records. And having them be gossiped just the same as other infos saves complexity. Since we need that anyways, I don't want to implement a different method for the bootstrap server.

Oh, I'm asking for something similar above. I think the bootstrap API is different to what Kitsune does internally, I'd be fine with seeing a piece of code that sends a delete to the bootstrap server rather than a PUT with a delete marker

Actually, I'd be happier :)

Consider the verifiability aspect. We cannot let just anybody delete any record, so we'd have to come up with some alternate payload for the clients to sign anyways... Can we just use the existing functionality?

mattyg · 2024-11-16T05:09:20Z

crates/boot_srv/src/lib.rs

+///
+/// A `GET` on `/boot/Base64Space`.
+///
+/// - The server MUST respond with a complete list of stored infos.


Does that include expired info that hasn't been deleted?

That's up to the server implementation. In my imaginings there will be a periodic task cleaning up expired entries, and it would also adjust the space whenever a PUT happened. So there is a chance some expired entries will be gossiped, but only just, and kitsune will be tolerant to that.

mattyg · 2024-11-16T05:16:20Z

crates/boot_srv/src/lib.rs

+/// - The server MUST reject the request if the body is > 1024 bytes.
+/// - The server MUST reject the request if `createdAt` is not within
+///   3 minutes in either direction of the server time.
+/// - The server MUST reject the request if `expiresAt` is in the past.


Whats the rationale is for the client requesting an expiration time? Do we want to put some configurable constraints on the expiration timespan? Too short could be a form of spam. Too long could cause some agents to outlive others in the store.

I only added the reject if more than 30 minutes: 84799bc

The clients can already spam the server by sending a new createdAt every millisecond or whatever, so that side will have to be covered by the rate limiting.

crates/boot_srv/src/lib.rs

mattyg · 2024-11-16T05:26:53Z

crates/boot_srv/src/lib.rs

+/// and when any entries are removed, the indexes of any items after them
+/// will be decrimented.
+///
+/// - The server SHOULD provide a configurable per-space max info count.


Should it also provide a max expiration timespan?

To avoid the complexity of the server needing to tell the clients what the "configured" expiration time is so they know how often to refresh / what they can set it to... Let's start with just a hard-coded maximum of 30 minutes, and see if there is need for anything else.

crates/boot_srv/src/lib.rs

mattyg · 2024-11-16T05:57:27Z

crates/boot_srv/src/lib.rs

+///
+/// A `GET` on `/boot/Base64Space`.
+///
+/// - The server MUST respond with a complete list of stored infos.


We may want to include GET parameters to support pagination such as "count" and "offset"

Avoiding complexity like this, as well as needing to manage our server memory usage is the reason for limiting the infos per space. A max of 32k (plus a little overhead for commas and square brackets) will be easy to send in a single response.

But the infos per space is configurable, what if another server has a very high limit and sends much more in a response?

I assert that allowing so many bootstrap agents within a space that they cannot be sent in a single response is YAGNI. We could put documentation around the config explaining that the max response size will be MAX_INFOS * MAX_INFO_BYTES and to consider that before setting. Or error starting the server if that response size is beyond some threshold.

mattyg · 2024-11-16T06:04:48Z

crates/boot_srv/src/lib.rs

+//! the canonical encoding of that type is JSON, we just redefine a subset
+//! of the schema here, and only validate the parts required for bootstrapping.
+//!
+//! For additional details, please see the [spec].


Might also include a section about recommended client implementation. I.e. expiration times, retry, scheduling of updating infos

Done: d49c23e

jost-s

Some questions

crates/boot_srv/src/lib.rs

jost-s · 2024-11-18T22:11:55Z

crates/boot_srv/src/lib.rs

+/// - If the store count is < MAX_INFOS, the server MUST push the info
+///   onto the stack.
+/// - If the store count is >= MAX_INFOS, the server MUST delete the entry
+///   at MAX_INFOS / 2 and then push the info onto the stack.


MAX_INFOS / 2 is the first of the recently put agent infos?

No, it's the MAX_INFOS / 2th most recently pushed info.

Why that and not the oldest one for example?

That's described in the first paragraphs of section 3.2, basically the first half of the stack is allocated to known reliable nodes (because they've been online without expiring for the longest). And the second half of the stack is allocated to an ever changing list of the most recent nodes to publish their agent infos (to make it less possible for the older peers to collude on an eclipse attack).

The "delete the middle one" algorithm was my attempt at the simplest design of this.

crates/boot_srv/src/lib.rs

artbrock · 2024-11-18T22:36:19Z

crates/boot_srv/src/lib.rs

+/// Even when that API is implemented, however, the default strategy defined
+/// next will be used on all spaces that have not been registered with a
+/// different strategy. Therefore, we can proceed for now without consideration
+/// for any other future strategies.


Should the list of returned agent infos be segmented into two lists by trusted/authoritative and self-published?

And for now the trusted list could be empty, and the self-published list works as being described in this doc?

I think we're trying to avoid placeholders in the codebase as they add noise and friction for development.

My intention is that spaces will either use the "default" storage strategy defined here, or the "trusted" strategy to be defined in the future. Once we have the ability, if you register a space as trusted, it will then only accept trusted infos and only return those trusted infos.

artbrock · 2024-11-18T22:39:47Z

crates/boot_srv/src/lib.rs

+///   is beyond a limit.
+/// - A server MAY make this limit configurable.
+#[cfg(doc)]
+pub mod spec {}


Can we also have a /metrics path which behaves similar to the current metrics system (showing number of space and active nodes in each space (or maybe number of trusted and self-published nodes in each space)) for a block of time, except this time have the results be REVERSE chronological showing the most recent time blocks first?

Usually metrics are a server-specific implementation detail. I'm happy to include a metrics api with the server.

@artbrock - do you think a metrics api needs to be included in the spec? Indicating that any k2 bootstrap server implementation anybody made would respond with the exact same metrics?

Co-authored-by: Jost Schulte <[email protected]>

ThetaSinner · 2024-11-19T17:11:29Z

crates/boot_srv/Cargo.toml

@@ -0,0 +1,13 @@
+[package]
+name = "kitsune2_boot_srv"


boot sounds like it's for startup, I'd prefer to spell this one out :)

Shall we just go with bs? /jk

Done: edc2546

ThetaSinner · 2024-11-19T17:15:25Z

crates/boot_srv/src/lib.rs

+/// ListResponse = [ AgentInfoSigned, .. ]
+/// ```
+///
+/// - `PUT /boot/Base64Space/Base64Agent`


Can we make it clearer that these are path parameters? {Base64Space} or <Base64Space>?

Good call, done: 4b12a8b

ThetaSinner · 2024-11-19T17:20:24Z

crates/boot_srv/src/lib.rs

+/// AgentInfo = {
+///   "agent": Base64Agent,
+///   "space": Base64Space,
+///   "createdAt": I64,


Can we be more specific with this type? local time vs UTC time is so easy to get tripped up on

Done: dd48b48

ThetaSinner · 2024-11-19T17:22:35Z

crates/boot_srv/src/lib.rs

+/// - The server MUST reject the request if `signature` is invalid vs
+///   the `agentInfo`.
+/// - If `isTombstone` is `true`, the server MUST delete any existing
+///   info being held.


This doesn't seem right to me. A tombstone to me is a marker that something has been deleted. So I'd expect people querying the bootstrap server to see this agent as having removed themselves. If we just want ourselves gone, that should be DELETE verb?

ThetaSinner · 2024-11-19T17:23:54Z

crates/boot_srv/src/lib.rs

+///   the `agentInfo`.
+/// - If `isTombstone` is `true`, the server MUST delete any existing
+///   info being held.
+/// - If `isTombstone` is `false`, the server MAY begin storing the info.


Oh, I'm asking for something similar above. I think the bootstrap API is different to what Kitsune does internally, I'd be fine with seeing a piece of code that sends a delete to the bootstrap server rather than a PUT with a delete marker

Actually, I'd be happier :)

ThetaSinner · 2024-11-19T17:26:02Z

crates/boot_srv/src/lib.rs

+///
+/// A `GET` on `/boot/Base64Space`.
+///
+/// - The server MUST respond with a complete list of stored infos.


I like this, it will simplify things like hcterm and general debugging where you have to keep calling "give me some more please" and hope to build a full set for display :)

Was there some kind of security reason not to do this in the first place? Or was it just that Kitsune wasn't supposed to call it like this?

The legacy bootstrap server holds all agents within the space. We didn't ever want to send down the potentially millions of entries we thought holochain might scale to, so we had the server randomize the list and send you only a handful.

create the bootstrap spec

fcdc216

neonphog commented Nov 15, 2024

View reviewed changes