Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUT late shutdown and Immutable Image OSes #2836

Open
jimklimov opened this issue Mar 5, 2025 · 0 comments
Open

NUT late shutdown and Immutable Image OSes #2836

jimklimov opened this issue Mar 5, 2025 · 0 comments
Labels
documentation feature packaging permissions portability We want NUT to build and run everywhere possible
Milestone

Comments

@jimklimov
Copy link
Member

A chat with @poettering (feel free to correct me if I remembered something wrong here) at FOSDEM 2025 brought up an interesting consideration: as he argues in favor of immutable image based operating environments, whose life cycle is effectively managed by another immutable OS image living as initrd (and probably systemd at the heart of it all), the currently typical NUT shutdown integration would not be feasible/welcome there in the way it is done now...

Roughly speaking, what we do currently on numerous platforms is:

  • A NUT server (with physical connection to the UPS) normally runs drivers and the data server, probably also an upsmon instance in primary role (whoever is actually fed by that UPS should have a copy; most of the time the system managing an UPS is also fed by it).
  • If a power outage occurs, this server raises FSD (Forced Shut Down) state to tell everyone else to shut down, and after a while, a locally running upsmon shuts down its own operating system too.
  • As part of such FSD handling, the upsmon creates a file in location specified by POWERDOWNFLAG configuration from its upsmon.conf.
  • Running daemons, including NUT drivers for the UPS and the data server, are stopped as are any other services.
  • (Some systems eventually kill all userland processes and remount read-only)
  • In case of systemd-driven systems, the late shutdown hook script in /usr/lib/systemd/system-shutdown/nutshutdown kicks in, finds that POWERDOWNFLAG file, and runs the NUT driver program again to tell the UPS to power-off/power-cycle (so the UPS usually turns on automatically when the wall power returns - or if it already has, and all fed systems are guaranteed to fully restart) at/after the moment we know the power loss would not corrupt any data.

This last bit actually places a number of constraints on the environment:

  • The POWERDOWNFLAG file location should still be mounted and at least readable;
  • The NUT configuration files (e.g. /etc/nut or /etc/ups) should be on filesystems still mounted and at least readable, so that the correct driver is chosen and connects to the expected device(s);
  • The NUT driver programs and any libraries they might dynamically link to (and possibly their resource files - maybe SNMP MIBs, etc.) should be on filesystems still mounted and at least readable (programs also executable).
  • NOTE: In recent NUT releases, there is a way to tell the running driver program to turn the UPS off, instead of re-initializing the connection (can take long, a PITA in case of SNMP walks specifically); but there is no practical use for that to my knowledge. The drivername -k handling to kill power automatically tries to talk to an existing daemonized copy first (if found), before taking the matter into its own hands. But the systems/frameworks that indiscriminately kill off userland processes are unlikely to benefit from this anyway, unless they support some method of exempting certain programs from a killing spree.

These constraints go a bit against the goal that the image-based OSes want the operating environment fully unmounted, not even leaving read-only tentacles in place.

One feasible idea from the chat was to have NUT driver package(s) installed (also) into the initrd image, automatically pulling whatever dependencies are needed for the libraries it uses. Maybe a user-curated selection of drivers, maybe a vendor/corporation dictated "everything" (for signed images to be ubiquitously useful). Also a few tools like upsmon (to check with upsmon -K that FSD is in progress) and upsdrvctl would also be needed.

And it would be that initrd image's shutdown hooks that tell the UPS to go off, after the production environment is safely unmounted and flushed.

  • A location like /run (maybe it exactly?) might be used to convey not only the POWERDOWNFLAG file existence and magic content for that FSD handling to kick in, but also a copy of latest-known NUT configuration files.
  • In the nutshutdown script, the NUT_CONFPATH could point to that copy; NUT_STATEPATH, NUT_ALTPIDPATH (any other?) envvars could be used to point to respective location usable in the initrd environment (e.g. /dev/shm for R/W locations, if at all used in shutdown routine - I think it would be a bug if PID files or socket files are created at that point; location existence may be checked though, not sure).
  • The driver programs called from nutshutdown could be told to just run as root and not drop privileges, as other accounts (and udev rules in case of USB/Serial links) would likely not be configured at that point. Or maybe they would be there, if packaging did work all the way in initrd too.
  • Not sure about access to networked power devices (SNMP, NetXML, remote IPMI...) - if the system would still have an IP address at that point, or if it goes away when network{,ing}.service gets stopped. In legacy systems, the age where the late shutdown originated, an address stayed assigned until the OS power-cycled itself, so an snmp-ups could be commanded to power-cycle just as well...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature packaging permissions portability We want NUT to build and run everywhere possible
Projects
None yet
Development

No branches or pull requests

1 participant