[FEATURE] Keep track of hostnames within SMD #43

synackd · 2024-11-22T23:23:29Z

Is your feature request related to a problem? Please describe.
Currently, only Node IDs and Xnames are used to identify nodes in SMD. OpenCHAMI's DHCP solution coresmd currently sets node hostnames via Node IDs, using the hard-coded nid as a prefix (e.g. nid001). However, some sites use named clusters and it would be useful to be able to customize the hostname to match the cluster name. For example, given a cluster named foobar, the first compute node could be named fb01.

It would seem that customizable hostname would also be quite useful in general.

Describe the solution you'd like
Add one or more fields to the Component structure in SMD relating to the hostname. How this is implemented is worth discussion.

One option could be having a ClusterPrefix field that could contain text to prefix the NID. E.g. for ClusterPrefix=fb for the foobar cluster above for node ID 1, the hostname could be fb001.

Another option could be specifying Hostname field that could support a format string. For example, Hostname="fb%03n" would make node 1 have the hostname fb001 (%03n could mean node ID (%n) with 3 leading zeros).

Describe alternatives you've considered
All of the ideas I had are above, but other better ones are welcome.

Additional context
Any other variables to be considered for hostname?

The text was updated successfully, but these errors were encountered:

This is meant as a temporary solution until hostnames are figured out in SMD. See: OpenCHAMI/smd#43

synackd · 2024-11-26T22:45:56Z

Another idea is to use Go templates that are executed on the Component data structure. For example, the string fb{{printf \"%03d\" .NID}} could be sent as the value in a hypothetical HostName field in a POST/PUT request for a Component. SMD could then execute that template over the Component struct it creates and set the result as the final HostName value. Thus, the template would evaluate to, if the NID was 5: fb005.

If we left the template execution up to SMD, the same template could be set for many Components and the hostname would become static, since the result of the execution would become the hostname. If we instead stored the template itself in the field, anything fetching the hostname would have to calculate the template, which may not be desired. Thus, it may be preferred to have SMD execute the template when the Component is created within it.

CoreDHCP, within the coresmd plugin, since it fetches the Components from SMD when checking if a node exists, could trivially read this HostName field and, if blank, by default use nid<NID> where <NID> is the value of the Component's NID field.

The RedfishEndpoint structure has Name, Hostname, Domain, and FQDN fields already, but, since it is possible for multiple nodes (Components) to be attached one BMC (RedfishEndpoint), perhaps it would make sense to have these fields for Components as well.

The difficulty I see with this modification is that SMD still imports hms-base and so we would have to either fork that repo and edit the imports or move the functionality from that module elsewhere inside the OpenCHAMI organization.

This is meant as a temporary solution until hostnames are figured out in SMD. See: OpenCHAMI/smd#43

alexlovelltroy · 2024-11-27T16:21:09Z

I'm not in favor of making substantive changes to SMD and its database unless we're working closely with team from HPE. We should consider non-SMD places to store and manage naming information.

synackd · 2024-12-02T15:58:23Z

If we don't store hostname information in SMD, the next logical place to store it would be within CoreDHCP; however, I'm apprehensive about introducing persistent state to CoreDHCP since the coresmd plugin was meant to use SMD as its source of truth.

A use case that LANL has is being able to distinguish special types of nodes within a cluster in addition to the regular compute nodes, e.g. login and I/O nodes, and have each's hostname reflect that. coresmd v0.0.3-rc1 adds the ability to specify a cluster prefix (nid by default) that is prepended to the NID, zero-padded to four places. This is fine for clusters that have homogeneous naming, but having heterogeneous naming as described above will require additional storage/logic.

We are already storing persistent state in SMD and BSS, and it has been expressed that SMD is supposed to be the main source of truth. Plus, The RedfishEndpoint structure in SMD is already storing hostname data (which would be used if there wasn't a possibility of having more than one node component per BMC), so it doesn't seem like a far stretch to have this for Components as well. From an administrator perspective, it seems worth it long-term to have this functionality if it requires working with HPE. If we are going to have to store hostname state somewhere, it seems more straightforward (and in alignment with SMD being the source of truth) to add it to an existing DB (SMD) that already has hostname fields for other structures than to try to add it elsewhere.

alexlovelltroy · 2024-12-02T16:40:30Z

Agreed that another place to store state is not ideal. I've raised the issue on slack and asked for HPE reps to join Github so we can have discussions here.

I think our options are:

Extend SMD with another endpoint that can manage/store hostnames and aliases for nodes
Extend SMD to include hostnames and aliases in the Component struct delivered by the components/<xname> endpoints
Introduce another microservice that stores/manages node/instance information including hostnames and aliases.
- Use SMD as an optional backend for the new microservice
Other?

ajgarside · 2024-12-12T01:45:18Z

Perhaps there should be agreement on the OpenCHAMI naming requirements ahead of finalizing where names/aliases should be stored as the requirements may impact choices.

I’ve seen customers who prefer and request any one of the following:

Xnames
NID names: nid001
Sequential names: n[1-1000]
Hardware specific names: dl380g11, bardpeak001, …
Multiple names: I’ve seen training clusters set up with a set of sequential names like n[1-999] and a secondary set of names like bob[1-10] and amy[1-10] where the bob and amy nodes are aliases for 10 node subsets of the n[1-999] nodes. As I recall, this was done so that users could be allocated a set of nodes that were easily identifiable by them and those nodes could be changed out from under them without any hardcoded names in their scripts needing to be changed.

Is it enough to allow a single customizable hostname in addition to Xnames and Node IDs, or should an indeterminate number of names/aliases be allowed? My personal take is that there should be an unlimited number of aliases.

If there are several aliases for a primary name, how does a GET call know which hostname to return in the result? Always returning primary names when the caller is using an alias seems questionable. Perhaps each alias should be associated with a named alias group so callers can request that results be expressed with a particular class of name via a query parameter?

Another option could be specifying Hostname field that could support a format string. For example, Hostname="fb%03n" would make node 1 have the hostname fb001 (%03n could mean node ID (%n) with 3 leading zeros)

I’m a fan of this. I’ve both used and seen very similar methods used to generate hostnames from auto-discovered nodes.

synackd added the enhancement New feature or request label Nov 22, 2024

synackd changed the title ~~[FEATURE] Keep rack of hostnames within SMD~~ [FEATURE] Keep track of hostnames within SMD Nov 22, 2024

synackd added a commit to OpenCHAMI/coresmd that referenced this issue Nov 26, 2024

fix: increase 0-pad width from 3 to 4

da0d910

This is meant as a temporary solution until hostnames are figured out in SMD. See: OpenCHAMI/smd#43

synackd mentioned this issue Nov 26, 2024

fix: increase 0-pad width from 3 to 4 OpenCHAMI/coresmd#10

Merged

synackd added a commit to OpenCHAMI/coresmd that referenced this issue Nov 26, 2024

fix: increase 0-pad width from 3 to 4

1f70d65

This is meant as a temporary solution until hostnames are figured out in SMD. See: OpenCHAMI/smd#43

alexlovelltroy mentioned this issue Nov 29, 2024

[BUG] BSS POST returns 201 but data missing in database if hosts are not valid xnames OpenCHAMI/bss#51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Keep track of hostnames within SMD #43

[FEATURE] Keep track of hostnames within SMD #43

synackd commented Nov 22, 2024

synackd commented Nov 26, 2024

alexlovelltroy commented Nov 27, 2024

synackd commented Dec 2, 2024

alexlovelltroy commented Dec 2, 2024

ajgarside commented Dec 12, 2024

[FEATURE] Keep track of hostnames within SMD #43

[FEATURE] Keep track of hostnames within SMD #43

Comments

synackd commented Nov 22, 2024

synackd commented Nov 26, 2024

alexlovelltroy commented Nov 27, 2024

synackd commented Dec 2, 2024

alexlovelltroy commented Dec 2, 2024

ajgarside commented Dec 12, 2024