Skip to content

gluster_volume task fails when removing brick from disconnected peer #32

Open
@mtruneck

Description

@mtruneck

The scenario:

  • gluster_volume is correctly configured across an ansible group
  • one of the group's peer disconnects permanently (cluster scaled down in our case)
  • run gluster_volume ansible without the disconnected peer in the group

Expected result

  • gluster_volume task automatically removes the bricks from the disconnected peer

Real behaviour

  • it fails with ValueError: invalid literal for int() with base 10.

Reason

The problem is on the line 430 in gluster_volume.py:

def reduce_config(name, removed_bricks, replicas, force):
out = run_gluster(['volume', 'heal', name, 'info'])
summary = out.split("\n")
for line in summary:
if 'Number' in line and int(line.split(":")[1].strip()) != 0:
module.fail_json(msg="Operation aborted, self-heal in progress.")

Because it expects output from gluster volume heal [name] info like this:

Brick 10.10.1.102:/opt/volume
Status: Connected
Number of entries: 0

But in case the peer disconnected, the output is

Brick 10.10.1.102:/opt/volume
Status: Transport endpoint is not connected
Number of entries: -

So the condition on line 430 fails, because ' -' is not an int.

Task used:

- name: Configure Gluster volume.
  gluster_volume:
    state: present
    name: "{{ brick_name }}"
    brick: "{{ brick_dir }}"
    replicas: "{{ groups[gluster_group] | length }}"
    cluster: "{{ groups[gluster_group] | map('extract', hostvars, 'ansible_all_ipv4_addresses') | map('select', 'search', '^10\\.') | map('first') | list }}"
    host: "{{ ansible_all_ipv4_addresses | select('search', '^10\\.') | first }}"
    force: yes
  run_once: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions