Skip to content

Conversation

@NikitaCOEUR
Copy link

Add a new prevent_uncontrolled_reboots boolean attribute to the talos_machine_configuration_apply resource to prevent uncontrolled reboots during configuration updates.

Motivation

When applying Talos machine configurations with apply_mode=auto, Terraform may automatically reboot nodes if the configuration changes require it. This can be problematic in production environments where reboots need to be carefully controlled and scheduled.

Changes

Feature

  • New attribute: prevent_uncontrolled_reboots (boolean, default: false)
  • Behavior: When enabled with apply_mode=auto, performs a dry-run check during terraform plan
    • If reboot is required → automatically switches to staged mode
    • If no reboot needed → proceeds with auto mode normally
  • User feedback: Clear warning message with manual reboot command when mode is switched

Implementation

  • Added schema attribute with proper defaults and description
  • Created handleRebootPrevention() helper function to encapsulate the logic
  • Integrated dry-run API call to Talos to detect reboot requirements

Testing

  • Added acceptance test TestAccTalosMachineConfigurationApplyResourcePreventReboots
  • But a full integration test is really hard to perform because it only affects resource updates. This means you need to set up a cluster, then make a modification that requires a reboot, and apply the updated resource. In addition, I'm developing in an environment that doesn’t allow me to run Terraform acceptance tests (I can't use libvirt).

Result :

When a change is made that requires a reboot, the following happens:

Terraform used the selected providers to generate the following execution plan. Resource actions are
indicated with the following symbols:
 ~ update in-place

Terraform will perform the following actions:

 # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"] will be updated in-place
 ~ resource "talos_machine_configuration_apply" "controlplane" {
     ~ apply_mode                   = "auto" -> "staged"
       id                           = "machine_configuration_apply"
     ~ machine_configuration        = (sensitive value)
     ~ machine_configuration_input  = (sensitive value)
       # (4 unchanged attributes hidden)
   }

 # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"] will be updated in-place
 ~ resource "talos_machine_configuration_apply" "controlplane" {
     ~ apply_mode                   = "auto" -> "staged"
       id                           = "machine_configuration_apply"
     ~ machine_configuration        = (sensitive value)
     ~ machine_configuration_input  = (sensitive value)
       # (4 unchanged attributes hidden)
   }

 # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"] will be updated in-place
 ~ resource "talos_machine_configuration_apply" "controlplane" {
     ~ apply_mode                   = "auto" -> "staged"
       id                           = "machine_configuration_apply"
     ~ machine_configuration        = (sensitive value)
     ~ machine_configuration_input  = (sensitive value)
       # (4 unchanged attributes hidden)
   }

Plan: 0 to add, 3 to change, 0 to destroy.
╷
│ Warning: Reboot prevented - switched to staged mode
│ 
│   with module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"],
│   on ../../../modules/k8s/talos/cluster/07-apply_configuration.tf line 1, in resource "talos_machine_configuration_apply" "controlplane":1: resource "talos_machine_configuration_apply" "controlplane" {
│ 
│ Node 172.20.68.4: Configuration requires reboot. Mode automatically changed to 'staged'. Manually reboot
│ with: talosctl reboot --nodes 172.20.68.4
│ 
│ (and 2 more similar warnings elsewhere)
╵

Do you want to perform these actions?
 Terraform will perform the actions described above.
 Only 'yes' will be accepted to approve.

 Enter a value: yes

module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"]: Modifications complete after 0s [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"]: Modifications complete after 0s [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"]: Modifications complete after 0s [id=machine_configuration_apply]
╷
│ Warning: Reboot prevented - switched to staged mode
│ 
│   with module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"],
│   on ../../../modules/k8s/talos/cluster/07-apply_configuration.tf line 1, in resource "talos_machine_configuration_apply" "controlplane":1: resource "talos_machine_configuration_apply" "controlplane" {
│ 
│ Node 172.20.68.2: Configuration requires reboot. Mode automatically changed to 'staged'. Manually reboot
│ with: talosctl reboot --nodes 172.20.68.2
│ 
│ (and 2 more similar warnings elsewhere)
╵

If another Terraform apply is run before rebooting the nodes:

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no
changes are needed.
╷
│ Warning: Reboot prevented - switched to staged mode
│ 
│   with module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"],
│   on ../../../modules/k8s/talos/cluster/07-apply_configuration.tf line 1, in resource "talos_machine_configuration_apply" "controlplane":1: resource "talos_machine_configuration_apply" "controlplane" {
│ 
│ Node 172.20.68.3: Configuration requires reboot. Mode automatically changed to 'staged'. Manually reboot
│ with: talosctl reboot --nodes 172.20.68.3
│ 
│ (and 2 more similar warnings elsewhere)
╵

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

After rebooting the nodes, the state needs to be fixed:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"] will be updated in-place
  ~ resource "talos_machine_configuration_apply" "controlplane" {
      ~ apply_mode                   = "staged" -> "auto"
        id                           = "machine_configuration_apply"
        # (6 unchanged attributes hidden)
    }

  # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"] will be updated in-place
  ~ resource "talos_machine_configuration_apply" "controlplane" {
      ~ apply_mode                   = "staged" -> "auto"
        id                           = "machine_configuration_apply"
        # (6 unchanged attributes hidden)
    }

  # module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"] will be updated in-place
  ~ resource "talos_machine_configuration_apply" "controlplane" {
      ~ apply_mode                   = "staged" -> "auto"
        id                           = "machine_configuration_apply"
        # (6 unchanged attributes hidden)
    }

Plan: 0 to add, 3 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"]: Modifying... [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw3"]: Modifications complete after 0s [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw1"]: Modifications complete after 0s [id=machine_configuration_apply]
module.talos_platform.module.cluster.talos_machine_configuration_apply.controlplane["cpw2"]: Modifications complete after 0s [id=machine_configuration_apply]

Apply complete! Resources: 0 added, 3 changed, 0 destroyed.

…n_apply

Add a new `prevent_uncontrolled_reboots` boolean attribute to the
`talos_machine_configuration_apply` resource. When enabled with
`apply_mode=auto` (default), this option prevents uncontrolled reboots
by automatically switching to staged mode when a reboot is required.

Key features:
- Performs dry-run check during terraform plan
- Automatically switches from auto to staged mode if reboot detected
- Displays clear warning with manual reboot command
- Only applicable when apply_mode is auto
@github-project-automation github-project-automation bot moved this to To Do in Planning Nov 17, 2025
@talos-bot talos-bot moved this from To Do to In Review in Planning Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant