-
Notifications
You must be signed in to change notification settings - Fork 4
Add a subset of API-compatible endpoints to establish the feasibility of switching away from upstream BSS without losing functionality #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Added QuackStore implementation for managing groups, instances, and cluster defaults using Quack for persistence. - Introduced unit tests for QuackStore to ensure functionality and data integrity. - Updated go.mod to include new dependencies and replaced the quack module path. - Enhanced CloudConfigFile JSON marshaling and unmarshaling for better encoding handling.
…handlers - Added BootParams struct to define boot parameters including kernel, initrd, root filesystem, and cloud-init server configurations. - Implemented GenerateBootScript function to create iPXE boot scripts based on BootParams. - Created BootParamsHandler for handling HTTP requests to create, update, retrieve, and generate boot scripts. - Introduced MemoryStore for in-memory storage of boot parameters with versioning. - Added comprehensive unit tests for BootParams, boot script generation, and HTTP handlers to ensure functionality and reliability.
- Updated go.mod to include the new dependency for UUID generation. - Refactored import paths to use the new smdclient package structure. - Introduced BootParamsV1 struct for backward compatibility with the old API specification. - Enhanced boot parameters validation and merging logic. - Implemented new HTTP handlers for creating and managing boot parameters. - Added comprehensive unit tests for the new features and refactored existing tests for better coverage.
- Added QuackStore implementation for managing groups, instances, and cluster defaults using Quack for persistence. - Introduced unit tests for QuackStore to ensure functionality and data integrity. - Updated go.mod to include new dependencies and replaced the quack module path. - Enhanced CloudConfigFile JSON marshaling and unmarshaling for better encoding handling.
- Introduced a new test file for MemStore to validate its functionality. - Integrated standard test suite from cistore testing package to ensure comprehensive testing.
- Introduced a new command-line flag for selecting the storage backend (mem or quack). - Default storage backend set to memstore, with a new option for quackstore using a specified database path. - Updated the datastore initialization logic to handle both memstore and quackstore, including error handling for quackstore initialization.
- Removed base64 encoding handling from UnmarshalJSON for clarity. - Added MarshalJSON implementation to ensure proper JSON serialization of CloudConfigFile.
- Added functionality to generate a unique instance ID in the format "i-XXXXXX" using a random 6-digit hexadecimal string. - Updated the comment to clarify the instance ID generation process.
- Added QuackStore implementation for managing groups, instances, and cluster defaults using Quack for persistence. - Introduced unit tests for QuackStore to ensure functionality and data integrity. - Updated go.mod to include new dependencies and replaced the quack module path. - Enhanced CloudConfigFile JSON marshaling and unmarshaling for better encoding handling.
- Introduced new BSS API endpoints for managing boot parameters and generating boot scripts. - Added handlers for creating, updating, and retrieving boot parameters. - Implemented routing for BSS endpoints in the main application. - Updated go.mod to include new dependencies for BSS functionality. - Enhanced documentation for the new API endpoints with Swagger annotations.
|
I think that this is meant to be a solution to BSS's current lack of group name support when using the Postgres backend, but I'm a bit worried that by adding BSS functionality to cloud-init, we are stepping towards distributed monoliths instead of microservices, which I believe is the same reason we removed discovery from SMD. Is there an important use case for having cloud-init also provide boot parameter management and script generation? Just trying to understand why we're duplicating BSS functionality here versus refactoring BSS. |
|
I agree with @synackd here. I believed we had this discussion before and IIRC I think before we were certainly against duplicating the same functionality in multiple microservices. Is the intent here to eventually phase out BSS entirely? Are we not packing too many features in cloud-init alone? I'm wondering if adding this will move cloud-init in the same direction as SMD. |
|
This would replace BSS which has proven particularly hostile to refactoring and unit testing. |
|
I have confined the BSS portions to a |
|
qq, are the functions of BSS that OpenCHAMI was leveraging confined to just bootparameters and bootscripts? |
|
There are four endpoints described in the swaggerfile. At this point, OpenCHAMI is only leveraging the two concerned with bootparameters and bootscripts. It didn't look to us like there was added value in the dumpstate or hosts endpoints.
I'm not closed to the idea that we might leverage more of the API at some point. We simply haven't needed them. Are we missing something? |
|
I think reimplementing an API-compliant version of BSS that allows the functionality that we want to implement is probably the way to go, but I'm wondering if cloud-init is the right place for it. Are we sure we want to combine boot parameter handling and post-boot configuration into a single scope? |
Makes sense. FWIW I'm looking for any gaps, anything that BSS does today that isn't covered here and whether that matters. That really just leaves the pre-signed URLs (for private S3 buckets) and spire-tokens, which I don't see mentioned in the PR. I assume those aren't important to OpenCHAMI either. |
|
We're not using pre-signed urls or spire tokens at this point in OpenCHAMI. I'm not convinced that either add security given the unencrypted/unauthenticated nature of iPXE. However, if we choose to go down this path, we may need to open both assertions up for discussion. Pre-signed urls aren't necessary for LANL at this time because none of our system images, kernels, or initrd contain any secrets and are on segregated networks. If OpenCHAMI needs to support authenticated access to S3 resources, we'll need to update this project. I'll defer to @yogi4 who has worked a lot more with spire, but I think we'd prefer not to embed the spire join token in the kernel commandline and instead leverage the tpm attestation process to initialize spire. |
|
Looks like there is a boot/v1/endpoint-history that will provide information about the last time each node retrieved their bootscript. That sounds pretty useful |
|
Just sharing my personal opinion about cloud-init and BSS: These services serve different purposes and operate at different stages. BSS handles boot-time configuration (setting kernel parameters or delivering initrd, etc) essentially shaping how the system starts. cloud-init, by contrast, is about runtime configuration once the OS is up (e.g., user setup, networking, mount filesystems, package installs). I agree with @synackd’s point, but I’d also emphasize that cloud-init isn't a given (not every OCHAMI user will use it). Runtime config is complex and often handled differently across environments (cloud-init, Ansible, custom scripts, overlayfs, etc.). |
|
I agree that we don't want to force people into using cloud-init if they don't want to. I'm updating this branch/PR to isolate the BSS API in a separate executable with a separate build. We can review it once that's functional |
…rameters - Introduced a new netboot server in cmd/netboot-server/main.go. - Updated .goreleaser.yaml to include netboot-server build configuration. - Added API endpoints for creating, retrieving, and updating boot parameters in docs/swagger.yaml and docs/docs.go. - Defined BootParams schema in swagger documentation and Go code.
rainest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could basically just stuff BSS under a router subsection, correct? As-is it has multiple BSS endpoints under the root, but I think if we group those and adhere to code organization constraints, keeping these in a single repo/binary is fine.
Most everything this adds resides under pkg/bss. The only parts that need to interact with both are the main/cmd and the router, and we can design those such that they keep the components separate, and that we could, if desired, split them into actual separate repos later.
We'd need to be careful about how we code shared components, since it wouldn't be enforced by split repos, but it's doable. There are some bits in the PR as written that don't keep that separation clean (see the in-context comments).
I think you're also blocked from just ignoring the cloud-init bit--it looks like the cluster defaults initialization creates de-facto required configuration specific to cloud-init, and ideally we wouldn't have that.
Arguments for are:
- These are similar enough in use and needs: they're both basically infrequent human write, frequent machine read config stores that communicate with SMD a bit, and they both need to be accessible to machines before those machines are fully configured.
- Hey, it's an excuse to get away from the BSS codebase! If a simpler version of that is sufficient, may as well.
If we do want to separate it, we should have a separate repo rather than a separate binary: pinning them to the same module versions and needing multiple Containerfiles is annoying. SMDClient is probably a reasonable module-only repo, and not re-implementing that across services makes sense anyway.
cmd/cloud-init-server/main.go
Outdated
| if bssEnabled { | ||
| bssStorage := bss.NewMemoryStore() | ||
| initBssRouter(router, ciHandler, bssStorage) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the same pattern as the above routers, where it mounts a subrouter to some path within main(). The subrouter definition can still live in a function, but the prefix it lives under should be under the control of main().
cmd/cloud-init-server/main.go
Outdated
| }) | ||
| } | ||
|
|
||
| func initBssRouter(router chi.Router, handler *CiHandler, bssStorage bss.Store) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting a CiHandler here does bind BSS logic to CI-specific types, we should avoid that.
AFAICT the struct only serves to aggregate several common constructor arguments into one, and BSS only uses the SMDClient field. We should just the SMDClient alone; BSS will never have a need for the CI cluster name or CI store.
We may want to refactor that type into WithSMDClient(), WithCIStore(), etc. functions on the classes that use them, but that's probably better handled elsewhere.
That struct is kinda silly already: it doesn't serve a purpose other than just aggregating several shared constructor inputs for other types, and its consumers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree.
Based on other feedback, I'm working on a separate binary to serve the BSS portions. That implementation could be moved to a separate repo in the future as you suggest, but we'd need to do some careful planning to prevent a tightly coupled distributed monolith from reappearing.
The new signature is at:
cloud-init/cmd/netboot-server/main.go
Line 129 in 3dbbefb
| func initBssRouter(router chi.Router, sm smdclient.SMDClientInterface, bssStorage bss.Store) { |
Cleaning up that struct is on my mental todo list for cloud-init as well.
- Added QuackStore for persistent storage of boot parameters, supporting versioning and group templates. - Implemented API endpoints for managing boot parameters, including creation, retrieval, updating, and versioning. - Introduced a version command to display build information and server version. - Updated .goreleaser.yaml to support new archive formats (zip and tar.gz). - Enhanced documentation with new fields and API specifications in swagger and Go code. - Added tests for QuackStore functionality to ensure data integrity and operations.
…apshot build command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on 8c4c2b8 mostly re some organization changes to better allow alternate data stores and avoiding bash for integration tests.
| insecure bool | ||
| fakeSMDEnabled bool | ||
| storageBackend string | ||
| dbPath string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look at turning the server into a struct that main() instantiates from the args rather than using package vars.
For dbPath in particular, that struct should have a quackConfig field, whose value is a struct with Quack-specific parameters.
| serveCmd.Flags().BoolVar(&insecure, "insecure", os.Getenv("INSECURE") == "true", "Disable TLS verification (default from INSECURE env)") | ||
| serveCmd.Flags().BoolVar(&fakeSMDEnabled, "fake-smd", os.Getenv("CLOUD_INIT_SMD_SIMULATOR") == "true", "Enable FakeSMDClient simulation (default from CLOUD_INIT_SMD_SIMULATOR env)") | ||
| serveCmd.Flags().StringVar(&storageBackend, "storage-backend", "mem", "Storage backend to use (mem or quack)") | ||
| serveCmd.Flags().StringVar(&dbPath, "db-path", "netboot.db", "Path to the database file for quackstore backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, storage backend-specific flags should use some common prefix to scope them, so quack-db-path probably.
| Str("certPath", certPath). | ||
| Bool("insecure", insecure). | ||
| Str("storageBackend", storageBackend). | ||
| Str("dbPath", dbPath). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to leave room for other stores, we could either:
- Add a
func (s *Store) Describe() string"identify yourself" method tobss.Store, and then haveStr("storage", bssStorage.Describe()) - Add a log argument to
NewQuackStore()and have it log whatever setup info it chooses.
|
|
||
| info := VersionInfo() | ||
| uptime := time.Since(startTime).String() | ||
| memstats := &runtime.MemStats{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to include info other than the version, we may want to consider an entire diagnostic child router, with separate /info/version and /info/memstats endpoints. IDK how many, if any additional diagnostic endpoints we'd add, but if it's more than two they probably shouldn't live directly at the root.
| @@ -0,0 +1,97 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this functionally provides an integration test? It's not invoked by anything, but does perform tests.
I'd strongly recommend using Go to handle integration tests as well. If the server's changed into a struct, a TestMain() can instantiate it and start serving. That does skip testing the actual CLI, but that's usually simple enough to not warrant much concern.
Aside avoiding the joys of lengthy bash scripts, that avoids hidden test dependencies (this introduces one for HTTPie), lets us use a debugger for both the tests and tested instance, and allows collection of coverage and pass/fail data in the same format as other tests.
|
waiting to rebase on top of the quackstore PR (#68) once that merges. |
This pull request establishes the handlers and storage functions to allow cloud-init to function as a bss server in simpole cases. Below are the most important changes grouped by theme.
Storage Backend Enhancements:
quackstorebackend for persistent storage, configurable via the--storage-backendflag. Default remainsmemstore. Added support for specifying a database file path forquackstorevia the--db-pathflag. (cmd/cloud-init-server/main.go, [1] [2] [3] [4]pkg/smdclientinstead ofinternal/smdclientfor better modularity. (cmd/cloud-init-server/handlers.go, [1];cmd/cloud-init-server/main.go, [2]; other handler files)Boot Parameter Management:
pkg/bsspackage to handle boot parameters (BootParamsandBootParamsV1structs) with JSON parsing, validation, and merging capabilities. (pkg/bss/bootparams.go, pkg/bss/bootparams.goR1-R112)pkg/bss/bootscript.go, pkg/bss/bootscript.goR1-R88)Testing Improvements:
memstorebackend to validate its functionality. (internal/memstore/ciMemStore_test.go, internal/memstore/ciMemStore_test.goR1-R20)pkg/bss/bootparams_test.go, pkg/bss/bootparams_test.goR1-R285)Code Quality Enhancements:
generateInstanceIDfunction inquackstoreto generate unique instance IDs using random hexadecimal values. (internal/quackstore/quackstore.go, internal/quackstore/quackstore.goL351-R357)go.modfor UUID generation and YAML parsing. (go.mod, [1] [2]These changes enhance the flexibility, reliability, and test coverage of the
cloud-init-serverproject, paving the way for improved storage options and boot parameter handling.