Skip to content

"eval delivery limit reached" and "failed to ACK policy evaluation" warnings #1099

@sigmaris

Description

@sigmaris

We've deployed the Nomad autoscaler as a Nomad job (running docker.io/hashicorp/nomad-autoscaler:0.4.6 in a Podman container) to do cluster horizontal autoscaling. We have a couple of AWS auto scaling groups for Nomad client node pools and we're using the Nomad APM to get metrics about allocated CPU and memory on the nodes. Our config looks like:

# Not using the HA mode at the moment
high_availability {
  enabled        = false
  lock_namespace = "default"
  lock_path      = "nomad-autoscaler/lock_for_europe-1"
  lock_ttl       = "30s"
  lock_delay     = "15s"
}

http {
  bind_address = "0.0.0.0"
  bind_port    = 28865
}

policy {
  dir = "/etc/autoscaler/policies"
}

nomad {
  address = "unix:///etc/autoscaler-secrets/api.sock"
  region  = "europe-1"
}

apm "nomad-apm" {
  driver = "nomad-apm"
}

target "aws-asg-in-eu-west-1" {
  driver = "aws-asg"
  config = {
    aws_region            = "eu-west-1"
    aws_access_key_id     = "redacted"
    aws_secret_access_key = "redacted"
    aws_session_token     = "redacted"
  }
}

strategy "target-value" {
  driver = "target-value"
}

and the policy in /etc/autoscaler/policies/aws-eu-west-1-default-asg.hcl for scaling the AWS group:

scaling "autoscaling-aws-eu-west-1-default-asg" {
  enabled = true
  min     = 3
  max     = 9

  policy {

    check "nomad_allocated_cpu" {
      source = "nomad-apm"
      query  = "percentage-allocated_cpu"
      strategy "target-value" {
        target         = 70
        max_scale_up   = 2
        max_scale_down = 1
      }
    }

    check "nomad_allocated_memory" {
      source = "nomad-apm"
      query  = "percentage-allocated_memory"
      strategy "target-value" {
        target         = 70
        max_scale_up   = 2
        max_scale_down = 1
      }
    }

    target "aws-asg-in-eu-west-1" {
      aws_asg_name           = "cluster-europe-1-pool-default"
      node_class             = "aws-eu-west-1-default-asg"
      node_purge             = true
      node_selector_strategy = "empty_ignore_system"
    }
  }
}

The Nomad ACL policy used with the autoscaler is:

namespace "*" {
  policy       = "scale"
  capabilities = ["read-job"]
}

namespace "default" {
  policy       = "scale"
  capabilities = ["read-job"]

  variables {
    path "nomad-autoscaler/lock_for_europe-1" {
      capabilities = ["write"]
    }
  }
}

# Node write access is needed for the autoscaler to be able to drain and purge nodes.
node {
  policy = "write"
}

# If running Nomad Autoscaler Enterprise, the following ACL policy addition is needed to ensure it can read the Nomad Enterprise license:
operator {
  policy = "read"
}

and the AWS permissions for the role we give to the autoscaler are:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Action": [
				"autoscaling:DescribeScalingActivities",
				"autoscaling:DescribeInstanceRefreshes",
				"autoscaling:DescribeAutoScalingGroups"
			],
			"Effect": "Allow",
			"Resource": "*"
		},
		{
			"Action": [
				"autoscaling:UpdateAutoScalingGroup",
				"autoscaling:TerminateInstanceInAutoScalingGroup",
				"autoscaling:CreateOrUpdateTags"
			],
			"Effect": "Allow",
			"Resource": [
				"...ARNs of the ASGs here..."
			]
		}
	]
}

The autoscaler seems to work but we get these warning messages in the logs after it performs a scale out:

2025-06-04T03:29:49.852Z [INFO]  policy_eval.worker: scaling target: id=5be0c2ab-3407-77e0-e4ce-d123b5dba165 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster target=aws-asg-in-eu-west-1 from=5 to=7 reason="scaling up because factor is 1.650362" meta=map[nomad_policy_id:4ea06909-cf09-0230-b6f6-25d8847396c5]
2025-06-04T03:30:10.451Z [INFO]  internal_plugin.aws-asg-in-eu-west-1: successfully performed and verified scaling out: action=scale_out asg_name=cluster-europe-1-pool-default desired_count=7
2025-06-04T03:34:48.950Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=85b235ed-0763-45c0-7168-499ae224acb7 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 token=0dbdf3b8-3879-03d9-aba9-5b55d71db140 count=1 limit=1
2025-06-04T03:35:06.504Z [WARN]  policy_eval.worker: failed to ACK policy evaluation: eval_id=85b235ed-0763-45c0-7168-499ae224acb7 eval_token=0dbdf3b8-3879-03d9-aba9-5b55d71db140 id=5be0c2ab-3407-77e0-e4ce-d123b5dba165 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster error="evaluation ID not found"
2025-06-04T03:35:07.750Z [INFO]  policy_eval.worker: scaling target: id=aa3c1652-f3d5-ad68-e1da-5168b7c5fe62 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster target=aws-asg-in-eu-west-1 from=7 to=9 reason="scaling up because factor is 1.530879" meta=map[nomad_policy_id:4ea06909-cf09-0230-b6f6-25d8847396c5]
2025-06-04T03:35:28.502Z [INFO]  internal_plugin.aws-asg-in-eu-west-1: successfully performed and verified scaling out: action=scale_out asg_name=cluster-europe-1-pool-default desired_count=9
2025-06-04T03:40:06.505Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=7b6f16ec-938e-9b7b-2fbf-4df78ee71cb9 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 token=b0dff010-a68a-0020-d189-9163fd0aa627 count=1 limit=1
2025-06-04T03:40:36.441Z [WARN]  policy_eval.worker: failed to ACK policy evaluation: eval_id=7b6f16ec-938e-9b7b-2fbf-4df78ee71cb9 eval_token=b0dff010-a68a-0020-d189-9163fd0aa627 id=aa3c1652-f3d5-ad68-e1da-5168b7c5fe62 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster error="evaluation ID not found"

I could find issue #343 about this, but it doesn't seem to be the same problem as the autoscaler seems to be working fine otherwise (it successfully scaled out twice in the above log snippet), on issue #343 they mentioned the autoscaler had stopped working.

What do these warning messages indicate, are they a problem, and if so, how can I fix them?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Roadmapping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions