Skip to content

[BUG] Conflict updating a host's attributes while moving it to another folder #743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eliasp opened this issue Mar 11, 2025 · 9 comments
Assignees
Labels
bug Something isn't working role:agent This affects the agent role

Comments

@eliasp
Copy link

eliasp commented Mar 11, 2025

Describe the bug
Updating attributes of a host and updating its folder at the same time fails.

Component Name
Component Name: agent

Ansible Version

$ ansible --version
ansible [core 2.15.13]
  config file = None
  configured module search path = ['/tmp/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /tmp/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.21 (main, Dec  5 2024, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] (/usr/bin/python3)
  jinja version = 3.1.5
  libyaml = True

Checkmk Version and Edition

e.g. 2.2.0p26 (CRE)

Collection Version

$ ansible-galaxy collection list
# /usr/share/ansible/collections/ansible_collections                                                                                                                                       Collection                Version
------------------------- -------
ansible.posix             1.5.4
ansible.utils             5.0.0
checkmk.general           5.6.0
community.general         9.2.0
community.vmware          4.5.0
containers.podman         1.16.3
fedora.linux_system_roles 1.84.1
freeipa.ansible_freeipa   1.13.2
onepassword.connect       2.3.0

To Reproduce
Steps to reproduce the behavior:

  1. Write a playbook which uses ansible.builtin.include_role to execute the checkmk.general.agent role
  2. Set at least the variables checkmk_agent_folder and checkmk_agent_host_ip
  3. Move the targeted host to another folder than the one defined in checkmk_agent_folder
  4. Execute the playbook
  5. The execution aborts at the task Create host on server. with the following error:
{
  "changed": false,
  "failed_when_result": true,
  "msg": "ERROR: The folder parameter is different from the folder in which the host is located, while other parameters are also specified!\n                     If you want to move the host to a specific folder, please omit the other parameters:                     'attributes', 'update_attributes' and 'remove_attributes'."
}

Expected behavior
I'd expect checkmk.general.agent to handle this conflict transparently:

  • move the host to the target folder
  • then apply the conflicting changes to attributes

Actual behavior
Due to a bug in Checkmk, where the deletion of a host doesn't invalidate its DNS cache, I have to set the host's IP explicitly via checkmk_agent_host_ip.
This adds the IP to checkmk_agent_host_attributes, which in turn is passed as attributes to the module checkmk.general.host.

The Checkmk REST API apparently doesn't allow to modify a host's attributes at the same time as updating a host's folder and returns the corresponding error.

Minimum reproduction example
playbook_checkmk.yml:

---
- name: Debug the Checkmk role
  hosts: [REDACTED]
  tasks:
    - name: Configure firewall to allow inbound connections to Checkmk agent
      ansible.posix.firewalld:
        immediate: true
        permanent: true
        state: enabled
        service: checkmk-agent
    - name: Set vars for `checkmk.general.agent`
      ansible.builtin.set_fact:
        checkmk_agent_add_host: true
        checkmk_agent_auto_activate: true
        checkmk_agent_discover: true
        checkmk_agent_edition: cre
        checkmk_agent_folder: /pipeline/systems
        checkmk_agent_force_foreign_changes: true
        checkmk_agent_host_ip: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
        checkmk_agent_host_name: "{{ inventory_hostname_short }}"
        checkmk_agent_mode: pull
        checkmk_agent_protocol: https
        # checkmk_agent_secret - provided via CLI
        checkmk_agent_server: [REDACTED]
        checkmk_agent_server_ip: [REDACTED]
        checkmk_agent_server_protocol: https
        checkmk_agent_site: [REDACTED]
        checkmk_agent_tls: true
        checkmk_agent_user: automation
        checkmk_agent_version: "2.2.0p26"
    - name: Install, configure and register Checkmk agent
      ansible.builtin.include_role:
        name: checkmk.general.agent
  • Execute the playbook: ansible-playbook playbook_checkmk.yml -i [REDACTED], --extra-vars checkmk_agent_secret=[REDACTED]

Additional context
For now, I work around this problem with just this tasks before including the role:

- name: Move Checkmk host to the correct folder
  checkmk.general.host:
    name: "{{ checkmk_agent_host_name }}"
    folder: "{{ checkmk_agent_folder }}"
    state: present
    server_url: "{{ checkmk_agent_protocol }}://{{ checkmk_agent_server }}/"
    site: "{{ checkmk_agent_site }}"
    automation_user: "{{ checkmk_agent_user }}"
    automation_secret: "{{ checkmk_agent_secret }}"
@eliasp eliasp added the bug Something isn't working label Mar 11, 2025
@github-actions github-actions bot added the role:agent This affects the agent role label Mar 11, 2025
@msekania
Copy link
Contributor

@eliasp

It is not a bug! It is explicitly designed that way, (see also the error message!)

The problem is as follows:
You can set parameters both at folder level and at host level. The host then receives the so-called effective parameters.
What happens if I set the parameters and at the same time move the host to the new folder, with different parameters than the current folder?
In this case, it can easily lead to a situation where you expect a certain behavior but get something else.
To avoid this, it was intentionally split into two workflows, either set parameters or move the host, as it is also stated in the error message.

@eliasp
Copy link
Author

eliasp commented Mar 12, 2025

I'm aware, that on the module level, that's the expected behavior.
But what's not expected, is seeing this behavior on the role level.

Including the checkmk.general.agent role and supplying the desired variables should never lead to this error message, but the separate handling of those things should be dealt with role-internally, using an approach similar to my workaround.

@robin-checkmk
Copy link
Member

I think @eliasp has a point. I have seen the issue a while back, but IIRC I had an error in my configuration.
It might take some time, but I will take a second look at this and try to understand, whether we can improve things.
Thanks for the extensive and well written issue though!

@msekania
Copy link
Contributor

msekania commented Mar 13, 2025

@robin-checkmk,

https://github.com/Checkmk/ansible-collection-checkmk.general/blob/2188a6dc031a021ba9148e77877b002dcb82b41d/roles/agent/tasks/Linux.yml#L54C1-L72C29

The following should do the job
just split the above referenced code lines in two tasks (first without attributes field)

- name: "{{ ansible_system }}: Create (or move existing) host on server."
  checkmk.general.host:
    server_url: "{{ checkmk_agent_server_protocol }}://{{ checkmk_agent_server }}:{{ checkmk_agent_server_port }}/"
    site: "{{ checkmk_agent_site }}"
    validate_certs: "{{ checkmk_agent_server_validate_certs | bool }}"
    automation_user: "{{ checkmk_agent_user }}"
    automation_secret: "{{ __checkmk_agent_auth }}"
    folder: "{{ checkmk_agent_folder | default(omit) }}"
    name: "{{ checkmk_agent_host_name }}"
    state: "present"
  become: false
  register: __checkmk_agent_create_result
  failed_when: |
    (__checkmk_agent_create_result.failed == true) and
    ("The host is already part of the specified target folder" not in __checkmk_agent_create_result.msg)
  delegate_to: "{{ checkmk_agent_delegate_api_calls }}"
  when: checkmk_agent_add_host | bool
  notify: "Activate changes"

- name: "{{ ansible_system }}: Update host attributes on server."
  checkmk.general.host:
    server_url: "{{ checkmk_agent_server_protocol }}://{{ checkmk_agent_server }}:{{ checkmk_agent_server_port }}/"
    site: "{{ checkmk_agent_site }}"
    validate_certs: "{{ checkmk_agent_server_validate_certs | bool }}"
    automation_user: "{{ checkmk_agent_user }}"
    automation_secret: "{{ __checkmk_agent_auth }}"
    folder: "{{ checkmk_agent_folder | default(omit) }}"
    name: "{{ checkmk_agent_host_name }}"
    attributes: "{{ checkmk_agent_host_attributes }}"
    state: "present"
  become: false
  register: __checkmk_agent_create_result
  failed_when: |
    (__checkmk_agent_create_result.failed == true) and
    ("The host is already part of the specified target folder" not in __checkmk_agent_create_result.msg)
  delegate_to: "{{ checkmk_agent_delegate_api_calls }}"
  when: checkmk_agent_add_host | bool
  notify: "Activate changes"

@robin-checkmk
Copy link
Member

I'm aware, that on the module level, that's the expected behavior. But what's not expected, is seeing this behavior on the role level.

Including the checkmk.general.agent role and supplying the desired variables should never lead to this error message, but the separate handling of those things should be dealt with role-internally, using an approach similar to my workaround.

So I took a look at this and I am not sure, if I agree. While a "fix" as proposed by @msekania (thanks, buddy!) would solve your issue, I think there is something more to it. The role's functionality to add a host targets exactly that: Adding a host. If the host you are adding is already part of your monitoring, you should set checkmk_agent_add_host: false.

Having the role move an existing host sounds way more surprising to me, than failing like in the initial description.

What do you gentlemen think?

@eliasp
Copy link
Author

eliasp commented Mar 21, 2025

I'm looking at this from a declarative/continuous desired state configuration perspective, not from an imperative "execute a one-shot task" point-of-view.

An object's state might change (as in our case due to unrelated external factors) and it should be possible to reconcile it at any given time, by applying the declarative definition to the control mechanism and have it figure out the necessary actions to fill this delta.

Compare this with using Ansible to maintain a certain set of packages that should be installed at any given time:

  • you declare the desired state (packages A, B and C should be installed)
  • Ansible installs those
  • a user comes along and for whatever reasons removes package B
  • Ansible reinstalls package B on the next run

In this case, I wouldn't expect to require the author of the used Ansible task to implement a special case to handle the situation of package B being absent.

Now in this specific case, I don't want to build additional complexity around using this role - executing it once with checkmk_agent_add_host: false when the host is already present and when it isn't, setting it to true.

Having the host present, in a given folder is just the declarative input I provide to the role, which then again should figure out the required steps to reconcile the delta between "declaration" and "current state".

@robin-checkmk
Copy link
Member

robin-checkmk commented Mar 24, 2025

You got me there mister!
From a pure Ansible zen point of view, you are entirely right and there is no argument to be made.

There is a loophole in the Ansible zen though:

  1. Ansible users are (most likely) not programmers.
  2. Declarative is better than imperative -- most of the time.

The reason I build the role the way it is now, is due to these two points: Users of this collection are likely no Ansible gurus. And while your point would make it proper idempotent, I am worried that a lot of users would be surpised by existing hosts being moved. I think we even had an issue about that back when. So eventually I think this comes down to a compromise between usability and dogma.

I am not saying my assumption is correct. Nor am I saying yours is wrong. I am genuinely concerned about the best way forward.

I have some thoughts, maybe y'all are up to discuss them to find out which provides most merrit:

  1. Add a new variable (checkmk_agent_move_existing hosts) which is disabled by default.
  2. Move hosts without question to enable proper idempotency.
  3. Improve documentation (and maybe logging) of the current behavior.

@msekania
Copy link
Contributor

Hi @robin-checkmk,

I see the points and would rather opt for 1 and 3 (3 would help anyway) to avoid further surprises.

@meni2029
Copy link
Contributor

meni2029 commented May 2, 2025

Hi, I faced the same issue with module checkmk.general.host after upgrading the collection from an old version. I implemented something similar to @eliasp's workaround, using Ansible block rescue function. I don't mind having to set a specific variable to change the behaviour, as long as it is available for the module checkmk.general.host.

Context: we are generating the yaml manifest with the hosts to be imported/updated in Checkmk using another tool. In this tool there's a validation mechanism of the changes, which will highlight any change of folder and attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working role:agent This affects the agent role
Projects
None yet
Development

No branches or pull requests

4 participants