-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create the bootstrap spec #6
base: main
Are you sure you want to change the base?
Changes from 5 commits
fcdc216
8711baa
0347f86
154138b
ba37870
dd48b48
5e4fc87
a2338e6
d584bcb
84799bc
512b421
984dc99
d49c23e
4b12a8b
edc2546
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
[workspace] | ||
members = [ | ||
"crates/api", | ||
"crates/boot_srv", | ||
] | ||
resolver = "2" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
[package] | ||
name = "kitsune2_boot_srv" | ||
version = "0.0.1-alpha" | ||
description = "p2p / dht communication framework api" | ||
license = "Apache-2.0" | ||
homepage = "https://github.com/holochain/kitsune2" | ||
documentation = "https://docs.rs/kitsune2_boot_srv" | ||
authors = ["Holochain Core Dev Team <[email protected]>"] | ||
keywords = ["holochain", "holo", "p2p", "dht", "networking"] | ||
categories = ["network-programming"] | ||
edition = "2021" | ||
|
||
[dependencies] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
#![deny(missing_docs)] | ||
//! Kitsune2 boot server is an HTTP REST server for handling bootstrapping | ||
//! discovery of peer network reachability in p2p applications. | ||
//! | ||
//! Despite being in the kitsune2 repo, `boot_srv` and `boot_cli` do not depend | ||
//! on any Kitsune2 crates. This is to ensure the bootstrapping functionality | ||
//! is well-defined, self-contained, easily testable in isolation, and | ||
//! usable for projects that don't choose to make use of Kitsune2 itself. | ||
//! | ||
//! That being said, the boot server and client are designed to transfer the | ||
//! `AgentInfoSigned` data type defined in the `kitsune2_api` crate. Since | ||
//! the canonical encoding of that type is JSON, we just redefine a subset | ||
//! of the schema here, and only validate the parts required for bootstrapping. | ||
//! | ||
//! For additional details, please see the [spec]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might also include a section about recommended client implementation. I.e. expiration times, retry, scheduling of updating infos There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done: d49c23e |
||
|
||
/// This is a documentation module containing the kitsune2_boot spec. | ||
/// | ||
/// #### 1. Types | ||
/// | ||
/// - `Base64Agent` - base64UrlNoPad safe encoded string agent id. | ||
mattyg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// - `Base64Space` - base64UrlNoPad safe encoded string space id. | ||
/// - `Base64Sig` - base64UrlNoPad safe encoded string crypto signature. | ||
/// - `Json` - string containing json that can be decoded. | ||
/// - `I64` - string containing an i64 number. | ||
/// | ||
/// ```text | ||
/// AgentInfoSigned = { "agentInfo": Json, "signature": Base64Sig } | ||
/// AgentInfo = { | ||
/// "agent": Base64Agent, | ||
/// "space": Base64Space, | ||
/// "createdAt": I64, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we be more specific with this type? local time vs UTC time is so easy to get tripped up on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done: dd48b48 |
||
/// "expiresAt": I64, | ||
/// "isTombstone": boolean | ||
/// } | ||
/// ``` | ||
/// | ||
/// Any other properties on these objects will be ignored and pass through. | ||
/// | ||
/// #### 2. REST API | ||
/// | ||
/// ##### 2.1. In Brief | ||
/// | ||
/// ```text | ||
/// ErrResponse = { "error": string } | ||
/// OkResponse = {} | ||
/// ListResponse = [ AgentInfoSigned, .. ] | ||
/// ``` | ||
/// | ||
/// - `PUT /boot/Base64Space/Base64Agent` | ||
mattyg marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we make it clearer that these are path parameters? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call, done: 4b12a8b |
||
/// - Request Body: `AgentInfoSigned` | ||
/// - Response Body: `OkResponse | ErrResponse` | ||
/// - `GET /boot/Base64Space` | ||
/// - Response Body: `ListResponse | ErrResponse` | ||
/// - `GET /` | ||
/// - Response Body: `OkResponse | ErrResponse` | ||
/// | ||
/// ##### 2.2. Publishing info to the boot server. | ||
/// | ||
/// A `PUT` on `/boot/Base64Space/Base64Agent` with an `AgentInfoSigned` | ||
/// json object as the request body. | ||
/// | ||
/// - The server MUST reject the request if the body is > 1024 bytes. | ||
jost-s marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// - The server MUST reject the request if `createdAt` is not within | ||
/// 3 minutes in either direction of the server time. | ||
/// - The server MUST reject the request if `expiresAt` is in the past. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Must reject if expiresAt <= createdAt There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done: a2338e6 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whats the rationale is for the client requesting an expiration time? Do we want to put some configurable constraints on the expiration timespan? Too short could be a form of spam. Too long could cause some agents to outlive others in the store. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I only added the reject if more than 30 minutes: 84799bc The clients can already spam the server by sending a new createdAt every millisecond or whatever, so that side will have to be covered by the rate limiting. |
||
/// - The server MUST reject the request if `signature` is invalid vs | ||
/// the `agentInfo`. | ||
/// - If `isTombstone` is `true`, the server MUST delete any existing | ||
/// info being held. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Must delete all AgentInfo records that meet all these criteria:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem right to me. A tombstone to me is a marker that something has been deleted. So I'd expect people querying the bootstrap server to see this agent as having removed themselves. If we just want ourselves gone, that should be DELETE verb? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #6 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it's not working as a tombstone on the bootstrap server... but it is working as a tombstone within the kitsune peer gossip, so let's just reuse that already written functionality. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
/// - If `isTombstone` is `false`, the server MAY begin storing the info. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd find it clearer to split create and delete into different endpoints. Then we could also drop the expiresAt field for delete requests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tombstone field is require for the functionality on the kitsune side. Since in the active network they cannot just delete an agent info--they might get the old one gossiped back to them again, so they need tombstone records. And having them be gossiped just the same as other infos saves complexity. Since we need that anyways, I don't want to implement a different method for the bootstrap server. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I'm asking for something similar above. I think the bootstrap API is different to what Kitsune does internally, I'd be fine with seeing a piece of code that sends a delete to the bootstrap server rather than a PUT with a delete marker Actually, I'd be happier :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider the verifiability aspect. We cannot let just anybody delete any record, so we'd have to come up with some alternate payload for the clients to sign anyways... Can we just use the existing functionality? |
||
/// See section 3. on storage strategies below. | ||
/// | ||
/// ##### 2.3. Listing data stored on the boot server. | ||
/// | ||
/// A `GET` on `/boot/Base64Space`. | ||
/// | ||
/// - The server MUST respond with a complete list of stored infos. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that include expired info that hasn't been deleted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's up to the server implementation. In my imaginings there will be a periodic task cleaning up expired entries, and it would also adjust the space whenever a PUT happened. So there is a chance some expired entries will be gossiped, but only just, and kitsune will be tolerant to that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may want to include GET parameters to support pagination such as "count" and "offset" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoiding complexity like this, as well as needing to manage our server memory usage is the reason for limiting the infos per space. A max of 32k (plus a little overhead for commas and square brackets) will be easy to send in a single response. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the infos per space is configurable, what if another server has a very high limit and sends much more in a response? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assert that allowing so many bootstrap agents within a space that they cannot be sent in a single response is YAGNI. We could put documentation around the config explaining that the max response size will be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this, it will simplify things like hcterm and general debugging where you have to keep calling "give me some more please" and hope to build a full set for display :) Was there some kind of security reason not to do this in the first place? Or was it just that Kitsune wasn't supposed to call it like this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The legacy bootstrap server holds all agents within the space. We didn't ever want to send down the potentially millions of entries we thought holochain might scale to, so we had the server randomize the list and send you only a handful. |
||
/// - If there are no infos stored at this space, the server MUST return | ||
/// an empty list (`[]`). | ||
/// - The only reason a server MAY return an `ErrResponse` for this request | ||
/// is in the case of internal server error. | ||
/// | ||
/// ##### 2.4. Health check. | ||
/// | ||
/// A `GET` on `/`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should use a prefix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done: 5e4fc87 |
||
/// | ||
/// - The server, in general, SHOULD return `OkResponse` to this request. | ||
/// - The server MAY return `ErrResponse` for internal errors or some other | ||
/// inability to continue serving correctly. | ||
/// | ||
/// #### 3. Storage Strategies | ||
/// | ||
/// ##### 3.1. The Future | ||
/// | ||
/// It is the intention someday in the future to add a "trusted" strategy, | ||
/// that will be triggerd via a new api, perhaps `/registerTrust/Base64Space`. | ||
/// | ||
/// Even when that API is implemented, however, the default strategy defined | ||
/// next will be used on all spaces that have not been registered with a | ||
/// different strategy. Therefore, we can proceed for now without consideration | ||
/// for any other future strategies. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the list of returned agent infos be segmented into two lists by trusted/authoritative and self-published? And for now the trusted list could be empty, and the self-published list works as being described in this doc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we're trying to avoid placeholders in the codebase as they add noise and friction for development. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intention is that spaces will either use the "default" storage strategy defined here, or the "trusted" strategy to be defined in the future. Once we have the ability, if you register a space as trusted, it will then only accept trusted infos and only return those trusted infos. |
||
/// | ||
/// ##### 3.2. The "Default" Storage Strategy | ||
/// | ||
/// Assumptions: | ||
/// | ||
/// - We don't want to store unbounded count infos in each space. | ||
/// - We like storing long-running reliable nodes. | ||
/// - We don't want to ONLY store long-running nodes to avoid eclipse scenarios. | ||
/// | ||
/// The solution put forth here is to allocate half our storage space to | ||
/// long-running nodes, and the other half to whoever has most recently | ||
/// published an info. The strategy for accomplishing this is defined here: | ||
/// | ||
/// Consider "the store" acting as a stack. It can be implemented in any manner, | ||
jost-s marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// but the following strategy assumes new entries are added to the end, | ||
/// and when any entries are removed, the indexes of any items after them | ||
/// will be decrimented. | ||
/// | ||
/// - The server SHOULD provide a configurable per-space max info count. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it also provide a max expiration timespan? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid the complexity of the server needing to tell the clients what the "configured" expiration time is so they know how often to refresh / what they can set it to... Let's start with just a hard-coded maximum of 30 minutes, and see if there is need for anything else.
mattyg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// For the duration of this document, that value will be called MAX_INFOS. | ||
/// It is recommended to default that value to `32`. | ||
mattyg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// - The server SHOULD delete expired infos periodically. | ||
/// - If the server is already storing an agent, and a `PUT` with a newer | ||
jost-s marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// `createdAt` arrives, the existing entry MUST be replaced. | ||
/// - If the store count is < MAX_INFOS, the server MUST push the info | ||
/// onto the stack. | ||
/// - If the store count is >= MAX_INFOS, the server MUST delete the entry | ||
/// at MAX_INFOS / 2 and then push the info onto the stack. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MAX_INFOS / 2 is the first of the recently put agent infos? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it's the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why that and not the oldest one for example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's described in the first paragraphs of section 3.2, basically the first half of the stack is allocated to known reliable nodes (because they've been online without expiring for the longest). And the second half of the stack is allocated to an ever changing list of the most recent nodes to publish their agent infos (to make it less possible for the older peers to collude on an eclipse attack). The "delete the middle one" algorithm was my attempt at the simplest design of this. |
||
/// | ||
/// #### 4. Rate Limiting | ||
/// | ||
/// Info count within a space is limited by the storage strategy above. But | ||
/// space count within a server is unbounded. While servers will likely want | ||
/// to limit request counts in general, we are also going to have to take some | ||
/// special care with PUT requests that result in creation of a new space that | ||
/// was not previously tracked in order to mitigate attacks. | ||
/// | ||
/// - A server SHOULD delete any spaces that no longer contain agents. | ||
/// - A server SHOULD decide a max count of spaces it intends to support | ||
/// (based on available memory or disk space for storing agents within those | ||
/// spaces), and then return error 429 beyond that. | ||
/// - A server MAY make the above max space count configurable. | ||
/// - A server SHOULD track the IP addresses of clients that make PUT requests | ||
/// which result in space creation, and error with 429 if that frequency | ||
/// is beyond a limit. | ||
/// - A server MAY make this limit configurable. | ||
#[cfg(doc)] | ||
pub mod spec {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also have a /metrics path which behaves similar to the current metrics system (showing number of space and active nodes in each space (or maybe number of trusted and self-published nodes in each space)) for a block of time, except this time have the results be REVERSE chronological showing the most recent time blocks first? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually metrics are a server-specific implementation detail. I'm happy to include a metrics api with the server. @artbrock - do you think a metrics api needs to be included in the spec? Indicating that any k2 bootstrap server implementation anybody made would respond with the exact same metrics? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boot
sounds like it's for startup, I'd prefer to spell this one out :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just go with
bs
? /jkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: edc2546