Skip to content

Script checks failing due to inconsistent checkID generation #26952

@ygersie

Description

@ygersie

Nomad version

Nomad v1.8.17+ent

Issue

Group level service definitions that define a script check with interpolated values in args produce inconsistent consul check ids between check registration and check TTL updates. This results in errors:

2025-10-15T15:05:24.859Z [WARN]  client.alloc_runner.task_runner.task_hook.script_checks: updating check failed: alloc_id=0fae3bc4-8d26-3b79-b16f-7e3ccbefbdf8 task=service error="Unexpected response code: 404 (Unknown check ID \"default/default/_nomad-check-2ceccfc3d4fe3e998810c92cb66a5e3f6be84f05\". Ensure that the check ID is passed, not the check name.)"

From what I can tell the reason is that when the checkID is generated (hash) the input of the struct is different. On registration, as mentioned in the code, group level services can not interpolate any task level variables. But they are part of the checkID generation path. When Nomad executes the script and starts to update the TTL of the registered Consul Check, it then uses a different checkID which does use interpolated values.

My suggestion would be, if possible, is to never interpolate certain fields, like the script args to have consistent checkID generation. Any value passed in args which is available during registration, like ${NOMAD_JOB_NAME} does work properly.

Reproduction steps

job "..." {
  group "..." {
      service {
        address_mode = "alloc"
        name         = "${NOMAD_JOB_NAME}-${NOMAD_NAMESPACE}"
        port         = 1337
        tags         = ["test"]

        check {
          address_mode = "alloc"
          task         = "ygersie-test-sidecar"
          type         = "script"
          interval     = "10s"
          timeout      = "5s"
          command      = "/bin/sh"
          args = [
            "-c",
            "/local/healthcheck.sh ${NOMAD_ALLOC_IP_dummy}"
          ]
        }
    }
    task "..." {
      .........
      template {
        destination = "local/healthcheck.sh"
        perms = "755"
        change_mode = "noop"
        data = <<-EOF
          echo {{ env "NOMAD_ALLOC_IP_dummy" }} >> /tmp/script_check.output
          echo "parameter is: $1" >> /tmp/script_check.output
          EOF
      }
    }
  }
}

If you don't have NOMAD_ALLOC_IP_dummy easily available you can just use NOMAD_TASK_NAME or any other type of value that can't be interpolated at the time of registration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions