Skip to content

Support Rotating ACL Tokens #53

@lornasong

Description

@lornasong

When a consul-esm instance's token is revoked, maybe from rotating acl tokens, there are some unexpected outcomes for consul-esm:

  • the instance's status remains passing/healthy and is never marked critical. This can be seen at /v1/health/node/:node
  • the instance's assigned external health checks are not successfully executed. as a result of staying "passing"/"healthy", the instance's assigned external health checks are not reassigned to other actually healthy instances with appropriate tokens
  • the instance is not able to successfully deregister

The revoked token is needed to update the health check and deregister. This is expected as a result of anti-entropy.

The larger issue around supporting rotating acl tokens is already captured in hashicorp/consul#4372. The recommendation is to reregister the application (consul-esm in this case) with the new token.

Currently, consul-esm doesn't have a way to reregister itself. On stopping and restarting consul-esm, the stopped instance will fail to deregister while the newly started instance will obtain a new id. This leads to having 'dead', floating consul-esm instances in the cluster. A serious consequence is that these dead consul-esm instances retain responsibility for their external health checks since they remain marked as healthy/passing in the catalog.

This issue arises from comment: #39 (comment)

Steps to reproduce

  1. Start consul (I used v1.6.2) with ACLs enabled
  2. Register two external health checks
  3. Start consul-esm (I used v0.3.3) with relevant token needed to operate and log_level=DEBUG
  4. Start another consul-esm with a different token needed to operate and log_level=DEBUG
  5. Observe that each consul-esm is executing one of the external health checks
  6. Delete token for one of the consul-esms
  7. Observe in consul-logs that revoked-token consul-esm has failed its TTL check
  8. Query /v1/health/node/<revoked-token-consul-esm-id> and see that the status is still passing
  9. Stop revoked-token consul-esm instance (Control+C)
  10. Observe in consul-logs that consul-esm was not able to successfully deregister
  11. Observe in remaining healthy consul-esm instance that it is executing only one external health check - the one it was originally assigned - and it did not inherit the other external health check

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions