Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create the bootstrap spec #6

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

create the bootstrap spec #6

wants to merge 15 commits into from

Conversation

neonphog
Copy link
Collaborator

Create a spec for the new bootstrap server API. All feedback welcome. Pay attention to the transition from //! to /// comments ; )

crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
///
/// ##### 2.4. Health check.
///
/// A `GET` on `/`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use a prefix /health for the api and leave / available for an fancy health monitor html page?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 5e4fc87

/// - The server MUST reject the request if the body is > 1024 bytes.
/// - The server MUST reject the request if `createdAt` is not within
/// 3 minutes in either direction of the server time.
/// - The server MUST reject the request if `expiresAt` is in the past.
Copy link
Member

@mattyg mattyg Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must reject if expiresAt <= createdAt

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: a2338e6

/// - The server MUST reject the request if `signature` is invalid vs
/// the `agentInfo`.
/// - If `isTombstone` is `true`, the server MUST delete any existing
/// info being held.
Copy link
Member

@mattyg mattyg Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must delete all AgentInfo records that meet all these criteria:

  • The 'space' field matches Base64Space from the request URL
  • The 'agent' field matches Base64Agent from the request URL
  • The 'createdAt' timestamp is older than or equal to the timestamp in the request body"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. A tombstone to me is a marker that something has been deleted. So I'd expect people querying the bootstrap server to see this agent as having removed themselves. If we just want ourselves gone, that should be DELETE verb?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not working as a tombstone on the bootstrap server... but it is working as a tombstone within the kitsune peer gossip, so let's just reuse that already written functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// the `agentInfo`.
/// - If `isTombstone` is `true`, the server MUST delete any existing
/// info being held.
/// - If `isTombstone` is `false`, the server MAY begin storing the info.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find it clearer to split create and delete into different endpoints. Then we could also drop the expiresAt field for delete requests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tombstone field is require for the functionality on the kitsune side. Since in the active network they cannot just delete an agent info--they might get the old one gossiped back to them again, so they need tombstone records. And having them be gossiped just the same as other infos saves complexity. Since we need that anyways, I don't want to implement a different method for the bootstrap server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm asking for something similar above. I think the bootstrap API is different to what Kitsune does internally, I'd be fine with seeing a piece of code that sends a delete to the bootstrap server rather than a PUT with a delete marker

Actually, I'd be happier :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the verifiability aspect. We cannot let just anybody delete any record, so we'd have to come up with some alternate payload for the clients to sign anyways... Can we just use the existing functionality?

///
/// A `GET` on `/boot/Base64Space`.
///
/// - The server MUST respond with a complete list of stored infos.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that include expired info that hasn't been deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's up to the server implementation. In my imaginings there will be a periodic task cleaning up expired entries, and it would also adjust the space whenever a PUT happened. So there is a chance some expired entries will be gossiped, but only just, and kitsune will be tolerant to that.

/// - The server MUST reject the request if the body is > 1024 bytes.
/// - The server MUST reject the request if `createdAt` is not within
/// 3 minutes in either direction of the server time.
/// - The server MUST reject the request if `expiresAt` is in the past.
Copy link
Member

@mattyg mattyg Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the rationale is for the client requesting an expiration time? Do we want to put some configurable constraints on the expiration timespan? Too short could be a form of spam. Too long could cause some agents to outlive others in the store.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added the reject if more than 30 minutes: 84799bc

The clients can already spam the server by sending a new createdAt every millisecond or whatever, so that side will have to be covered by the rate limiting.

crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
/// and when any entries are removed, the indexes of any items after them
/// will be decrimented.
///
/// - The server SHOULD provide a configurable per-space max info count.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also provide a max expiration timespan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the complexity of the server needing to tell the clients what the "configured" expiration time is so they know how often to refresh / what they can set it to... Let's start with just a hard-coded maximum of 30 minutes, and see if there is need for anything else.

crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
///
/// A `GET` on `/boot/Base64Space`.
///
/// - The server MUST respond with a complete list of stored infos.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to include GET parameters to support pagination such as "count" and "offset"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding complexity like this, as well as needing to manage our server memory usage is the reason for limiting the infos per space. A max of 32k (plus a little overhead for commas and square brackets) will be easy to send in a single response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the infos per space is configurable, what if another server has a very high limit and sends much more in a response?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assert that allowing so many bootstrap agents within a space that they cannot be sent in a single response is YAGNI. We could put documentation around the config explaining that the max response size will be MAX_INFOS * MAX_INFO_BYTES and to consider that before setting. Or error starting the server if that response size is beyond some threshold.

//! the canonical encoding of that type is JSON, we just redefine a subset
//! of the schema here, and only validate the parts required for bootstrapping.
//!
//! For additional details, please see the [spec].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also include a section about recommended client implementation. I.e. expiration times, retry, scheduling of updating infos

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: d49c23e

Copy link
Contributor

@jost-s jost-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions

crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
/// - If the store count is < MAX_INFOS, the server MUST push the info
/// onto the stack.
/// - If the store count is >= MAX_INFOS, the server MUST delete the entry
/// at MAX_INFOS / 2 and then push the info onto the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_INFOS / 2 is the first of the recently put agent infos?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the MAX_INFOS / 2th most recently pushed info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that and not the oldest one for example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's described in the first paragraphs of section 3.2, basically the first half of the stack is allocated to known reliable nodes (because they've been online without expiring for the longest). And the second half of the stack is allocated to an ever changing list of the most recent nodes to publish their agent infos (to make it less possible for the older peers to collude on an eclipse attack).

The "delete the middle one" algorithm was my attempt at the simplest design of this.

crates/boot_srv/src/lib.rs Outdated Show resolved Hide resolved
/// Even when that API is implemented, however, the default strategy defined
/// next will be used on all spaces that have not been registered with a
/// different strategy. Therefore, we can proceed for now without consideration
/// for any other future strategies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the list of returned agent infos be segmented into two lists by trusted/authoritative and self-published?

And for now the trusted list could be empty, and the self-published list works as being described in this doc?

Copy link
Member

@mattyg mattyg Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're trying to avoid placeholders in the codebase as they add noise and friction for development.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is that spaces will either use the "default" storage strategy defined here, or the "trusted" strategy to be defined in the future. Once we have the ability, if you register a space as trusted, it will then only accept trusted infos and only return those trusted infos.

/// is beyond a limit.
/// - A server MAY make this limit configurable.
#[cfg(doc)]
pub mod spec {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a /metrics path which behaves similar to the current metrics system (showing number of space and active nodes in each space (or maybe number of trusted and self-published nodes in each space)) for a block of time, except this time have the results be REVERSE chronological showing the most recent time blocks first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually metrics are a server-specific implementation detail. I'm happy to include a metrics api with the server.

@artbrock - do you think a metrics api needs to be included in the spec? Indicating that any k2 bootstrap server implementation anybody made would respond with the exact same metrics?

@@ -0,0 +1,13 @@
[package]
name = "kitsune2_boot_srv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boot sounds like it's for startup, I'd prefer to spell this one out :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just go with bs? /jk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: edc2546

/// ListResponse = [ AgentInfoSigned, .. ]
/// ```
///
/// - `PUT /boot/Base64Space/Base64Agent`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it clearer that these are path parameters? {Base64Space} or <Base64Space>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done: 4b12a8b

/// AgentInfo = {
/// "agent": Base64Agent,
/// "space": Base64Space,
/// "createdAt": I64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific with this type? local time vs UTC time is so easy to get tripped up on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: dd48b48

/// - The server MUST reject the request if `signature` is invalid vs
/// the `agentInfo`.
/// - If `isTombstone` is `true`, the server MUST delete any existing
/// info being held.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. A tombstone to me is a marker that something has been deleted. So I'd expect people querying the bootstrap server to see this agent as having removed themselves. If we just want ourselves gone, that should be DELETE verb?

/// the `agentInfo`.
/// - If `isTombstone` is `true`, the server MUST delete any existing
/// info being held.
/// - If `isTombstone` is `false`, the server MAY begin storing the info.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm asking for something similar above. I think the bootstrap API is different to what Kitsune does internally, I'd be fine with seeing a piece of code that sends a delete to the bootstrap server rather than a PUT with a delete marker

Actually, I'd be happier :)

///
/// A `GET` on `/boot/Base64Space`.
///
/// - The server MUST respond with a complete list of stored infos.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, it will simplify things like hcterm and general debugging where you have to keep calling "give me some more please" and hope to build a full set for display :)

Was there some kind of security reason not to do this in the first place? Or was it just that Kitsune wasn't supposed to call it like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy bootstrap server holds all agents within the space. We didn't ever want to send down the potentially millions of entries we thought holochain might scale to, so we had the server randomize the list and send you only a handful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants