"eval delivery limit reached" and "failed to ACK policy evaluation" warnings

We've deployed the Nomad autoscaler as a Nomad job (running `docker.io/hashicorp/nomad-autoscaler:0.4.6` in a Podman container) to do cluster horizontal autoscaling. We have a couple of AWS auto scaling groups for Nomad client node pools and we're using the Nomad APM to get metrics about allocated CPU and memory on the nodes. Our config looks like:

```hcl
# Not using the HA mode at the moment
high_availability {
  enabled        = false
  lock_namespace = "default"
  lock_path      = "nomad-autoscaler/lock_for_europe-1"
  lock_ttl       = "30s"
  lock_delay     = "15s"
}

http {
  bind_address = "0.0.0.0"
  bind_port    = 28865
}

policy {
  dir = "/etc/autoscaler/policies"
}

nomad {
  address = "unix:///etc/autoscaler-secrets/api.sock"
  region  = "europe-1"
}

apm "nomad-apm" {
  driver = "nomad-apm"
}

target "aws-asg-in-eu-west-1" {
  driver = "aws-asg"
  config = {
    aws_region            = "eu-west-1"
    aws_access_key_id     = "redacted"
    aws_secret_access_key = "redacted"
    aws_session_token     = "redacted"
  }
}

strategy "target-value" {
  driver = "target-value"
}
```

and the policy in `/etc/autoscaler/policies/aws-eu-west-1-default-asg.hcl` for scaling the AWS group:

```hcl
scaling "autoscaling-aws-eu-west-1-default-asg" {
  enabled = true
  min     = 3
  max     = 9

  policy {

    check "nomad_allocated_cpu" {
      source = "nomad-apm"
      query  = "percentage-allocated_cpu"
      strategy "target-value" {
        target         = 70
        max_scale_up   = 2
        max_scale_down = 1
      }
    }

    check "nomad_allocated_memory" {
      source = "nomad-apm"
      query  = "percentage-allocated_memory"
      strategy "target-value" {
        target         = 70
        max_scale_up   = 2
        max_scale_down = 1
      }
    }

    target "aws-asg-in-eu-west-1" {
      aws_asg_name           = "cluster-europe-1-pool-default"
      node_class             = "aws-eu-west-1-default-asg"
      node_purge             = true
      node_selector_strategy = "empty_ignore_system"
    }
  }
}
```

The Nomad ACL policy used with the autoscaler is:

```hcl
namespace "*" {
  policy       = "scale"
  capabilities = ["read-job"]
}

namespace "default" {
  policy       = "scale"
  capabilities = ["read-job"]

  variables {
    path "nomad-autoscaler/lock_for_europe-1" {
      capabilities = ["write"]
    }
  }
}

# Node write access is needed for the autoscaler to be able to drain and purge nodes.
node {
  policy = "write"
}

# If running Nomad Autoscaler Enterprise, the following ACL policy addition is needed to ensure it can read the Nomad Enterprise license:
operator {
  policy = "read"
}
```

and the AWS permissions for the role we give to the autoscaler are:

```json
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Action": [
				"autoscaling:DescribeScalingActivities",
				"autoscaling:DescribeInstanceRefreshes",
				"autoscaling:DescribeAutoScalingGroups"
			],
			"Effect": "Allow",
			"Resource": "*"
		},
		{
			"Action": [
				"autoscaling:UpdateAutoScalingGroup",
				"autoscaling:TerminateInstanceInAutoScalingGroup",
				"autoscaling:CreateOrUpdateTags"
			],
			"Effect": "Allow",
			"Resource": [
				"...ARNs of the ASGs here..."
			]
		}
	]
}
```

The autoscaler seems to work but we get these warning messages in the logs after it performs a scale out:

```
2025-06-04T03:29:49.852Z [INFO]  policy_eval.worker: scaling target: id=5be0c2ab-3407-77e0-e4ce-d123b5dba165 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster target=aws-asg-in-eu-west-1 from=5 to=7 reason="scaling up because factor is 1.650362" meta=map[nomad_policy_id:4ea06909-cf09-0230-b6f6-25d8847396c5]
2025-06-04T03:30:10.451Z [INFO]  internal_plugin.aws-asg-in-eu-west-1: successfully performed and verified scaling out: action=scale_out asg_name=cluster-europe-1-pool-default desired_count=7
2025-06-04T03:34:48.950Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=85b235ed-0763-45c0-7168-499ae224acb7 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 token=0dbdf3b8-3879-03d9-aba9-5b55d71db140 count=1 limit=1
2025-06-04T03:35:06.504Z [WARN]  policy_eval.worker: failed to ACK policy evaluation: eval_id=85b235ed-0763-45c0-7168-499ae224acb7 eval_token=0dbdf3b8-3879-03d9-aba9-5b55d71db140 id=5be0c2ab-3407-77e0-e4ce-d123b5dba165 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster error="evaluation ID not found"
2025-06-04T03:35:07.750Z [INFO]  policy_eval.worker: scaling target: id=aa3c1652-f3d5-ad68-e1da-5168b7c5fe62 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster target=aws-asg-in-eu-west-1 from=7 to=9 reason="scaling up because factor is 1.530879" meta=map[nomad_policy_id:4ea06909-cf09-0230-b6f6-25d8847396c5]
2025-06-04T03:35:28.502Z [INFO]  internal_plugin.aws-asg-in-eu-west-1: successfully performed and verified scaling out: action=scale_out asg_name=cluster-europe-1-pool-default desired_count=9
2025-06-04T03:40:06.505Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=7b6f16ec-938e-9b7b-2fbf-4df78ee71cb9 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 token=b0dff010-a68a-0020-d189-9163fd0aa627 count=1 limit=1
2025-06-04T03:40:36.441Z [WARN]  policy_eval.worker: failed to ACK policy evaluation: eval_id=7b6f16ec-938e-9b7b-2fbf-4df78ee71cb9 eval_token=b0dff010-a68a-0020-d189-9163fd0aa627 id=aa3c1652-f3d5-ad68-e1da-5168b7c5fe62 policy_id=4ea06909-cf09-0230-b6f6-25d8847396c5 queue=cluster error="evaluation ID not found"
```

I could find issue https://github.com/hashicorp/nomad-autoscaler/issues/343 about this, but it doesn't seem to be the same problem as the autoscaler seems to be working fine otherwise (it successfully scaled out twice in the above log snippet), on issue #343 they mentioned the autoscaler had stopped working.

What do these warning messages indicate, are they a problem, and if so, how can I fix them?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"eval delivery limit reached" and "failed to ACK policy evaluation" warnings #1099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"eval delivery limit reached" and "failed to ACK policy evaluation" warnings #1099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions