Skip to content

[BUG] rabbitmq 4.2.1 start error: kubelet PreStopHook failed #9954

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.6-beta.8
kbcli: 0.9.6-beta.0

helm get notes -n kb-system kb-addon-rabbitmq 
NOTES:
Release Information:
  Commit ID: "96e859b21a040bfcc0f6305b2c7c241202586523"
  Commit Time: "2025-12-11 09:57:29 +0800"
  Release Branch: "release-0.9"
  Release Time:  "2025-12-11 09:59:07 +0800"
  Enterprise: "false"

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: rabbitmq-snkjbr
  namespace: default
spec:
  terminationPolicy: DoNotTerminate
  componentSpecs:
    - name: rabbitmq
      componentDef: rabbitmq
      serviceVersion: 4.2.1
      replicas: 3
      resources:
        requests:
          cpu: 500m
          memory: 0.5Gi
        limits:
          cpu: 500m
          memory: 0.5Gi
      serviceAccountName: kb-rabbitmq-snkjbr
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. stop -> start
kbcli cluster stop rabbitmq-snkjbr --auto-approve --force=true

kbcli cluster start rabbitmq-snkjbr --force=true
  1. See error
kubectl get cluster rabbitmq-snkjbr 
NAME              CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
rabbitmq-snkjbr                                  DoNotTerminate       Updating   55m
➜  ~ 
➜  ~ kubectl get cmp rabbitmq-snkjbr-rabbitmq 
NAME                       DEFINITION   SERVICE-VERSION   STATUS     AGE
rabbitmq-snkjbr-rabbitmq   rabbitmq     4.2.1             Updating   55m
➜  ~ 
➜  ~ kubectl get ops
NAME                          TYPE    CLUSTER           STATUS    PROGRESS   AGE
rabbitmq-snkjbr-start-fbv86   Start   rabbitmq-snkjbr   Running   0/3        45m
➜  ~ 
➜  ~ kubectl get pod
NAME                         READY   STATUS    RESTARTS     AGE
rabbitmq-snkjbr-rabbitmq-0   1/2     Running   9 (4s ago)   45m

describe pod

kdp rabbitmq-snkjbr-rabbitmq-0
Name:             rabbitmq-snkjbr-rabbitmq-0
Namespace:        default
Priority:         0
Service Account:  kb-rabbitmq-snkjbr
Node:             192.168.0.124/192.168.0.124
Start Time:       Wed, 24 Dec 2025 12:15:06 +0800
Labels:           app.kubernetes.io/component=rabbitmq
                  app.kubernetes.io/instance=rabbitmq-snkjbr
                  app.kubernetes.io/managed-by=kubeblocks
                  app.kubernetes.io/name=rabbitmq
                  app.kubernetes.io/version=rabbitmq
                  apps.kubeblocks.io/cluster-uid=ff5368d7-cde3-4adf-bef4-12a006cc89c3
                  apps.kubeblocks.io/component-name=rabbitmq
                  apps.kubeblocks.io/pod-name=rabbitmq-snkjbr-rabbitmq-0
                  componentdefinition.kubeblocks.io/name=rabbitmq
                  controller-revision-hash=74cf7cb7fb
                  workloads.kubeblocks.io/instance=rabbitmq-snkjbr-rabbitmq
                  workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:      apps.kubeblocks.io/component-replicas: 3
                  vke.volcengine.com/cello-pod-evict-policy: allow
Status:           Running
IP:               192.168.0.134
IPs:
  IP:           192.168.0.134
Controlled By:  InstanceSet/rabbitmq-snkjbr-rabbitmq
Init Containers:
  init-lorry:
    Container ID:  containerd://c139007f16fddf9b287946316cd70038bec4669cddf8234ab3b5ee61dcdc1e22
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.6-beta.8
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:8441217e75c043d8def3f1519ff3374cea39f91e488b09d08074c1e0ac90415f
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /bin/lorry
      /config
      /bin/curl
      /kubeblocks/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 24 Dec 2025 12:15:10 +0800
      Finished:     Wed, 24 Dec 2025 12:15:10 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:  <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:  <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:            rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:              (v1:metadata.uid)
      KB_NAMESPACE:           default (v1:metadata.namespace)
      KB_SA_NAME:              (v1:spec.serviceAccountName)
      KB_NODENAME:             (v1:spec.nodeName)
      KB_HOST_IP:              (v1:status.hostIP)
      KB_POD_IP:               (v1:status.podIP)
      KB_POD_IPS:              (v1:status.podIPs)
      KB_HOSTIP:               (v1:status.hostIP)
      KB_PODIP:                (v1:status.podIP)
      KB_PODIPS:               (v1:status.podIPs)
      KB_POD_FQDN:            $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
Containers:
  rabbitmq:
    Container ID:  containerd://b4abfeae42e5101d7993ea7dc0e7b21824acf5373f5d6c776f7365cb973d1838
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq:4.2.1-management
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq@sha256:86fa2b761fc3a71a2b73090d7e45ad820f611fc829c1cb8cf087e09258fb65c1
    Ports:         4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP, 15692/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
      -c
      if [ ! -f /var/lib/rabbitmq/enabled_plugins ]; then
        cp /etc/rabbitmq/enabled_plugins /var/lib/rabbitmq/enabled_plugins
      fi
      cp /root/erlang.cookie /var/lib/rabbitmq/.erlang.cookie
      chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
      chmod 400 /var/lib/rabbitmq/.erlang.cookie
      exec /opt/rabbitmq/sbin/rabbitmq-server
      
    State:          Running
      Started:      Wed, 24 Dec 2025 13:30:12 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 24 Dec 2025 13:25:12 +0800
      Finished:     Wed, 24 Dec 2025 13:30:12 +0800
    Ready:          False
    Restart Count:  15
    Limits:
      cpu:                        500m
      memory:                     512Mi
      vke.volcengine.com/eni-ip:  1
    Requests:
      cpu:                        500m
      memory:                     512Mi
      vke.volcengine.com/eni-ip:  1
    Startup:                      tcp-socket :5672 delay=0s timeout=1s period=10s #success=1 #failure=30
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env      ConfigMap  Optional: false
      rabbitmq-snkjbr-rabbitmq-rsm-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:          <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:          <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:                      (v1:metadata.uid)
      KB_NAMESPACE:                   default (v1:metadata.namespace)
      KB_SA_NAME:                      (v1:spec.serviceAccountName)
      KB_NODENAME:                     (v1:spec.nodeName)
      KB_HOST_IP:                      (v1:status.hostIP)
      KB_POD_IP:                       (v1:status.podIP)
      KB_POD_IPS:                      (v1:status.podIPs)
      KB_HOSTIP:                       (v1:status.hostIP)
      KB_PODIP:                        (v1:status.podIP)
      KB_PODIPS:                       (v1:status.podIPs)
      KB_POD_FQDN:                    $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
      MY_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      MY_POD_NAMESPACE:               default (v1:metadata.namespace)
      SERVICE_PORT:                   15692
      RABBITMQ_MNESIA_BASE:           /var/lib/rabbitmq/mnesia
      RABBITMQ_LOG_BASE:              /var/lib/rabbitmq/logs
      K8S_SERVICE_NAME:               $(KB_CLUSTER_COMP_NAME)-headless
      RABBITMQ_ENABLED_PLUGINS_FILE:  /var/lib/rabbitmq/enabled_plugins
      RABBITMQ_USE_LONGNAME:          true
      RABBITMQ_NODENAME:              rabbit@$(KB_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
      K8S_HOSTNAME_SUFFIX:            .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
    Mounts:
      /etc/localtime from timezone (ro)
      /etc/rabbitmq/conf.d/12-kubeblocks.conf from rabbitmq-config (rw,path="rabbitmq.conf")
      /etc/rabbitmq/enabled_plugins from rabbitmq-config (rw,path="enabled_plugins")
      /root/erlang.cookie from rabbitmq-config (rw,path=".erlang.cookie")
      /usr/share/zoneinfo from zoneinfo (ro)
      /var/lib/rabbitmq from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
  lorry:
    Container ID:  containerd://91faa72d261554fd5b209a8cde5e36acbeeb7727d982245ed59b7d48ce452210
    Image:         apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq:4.2.1-management
    Image ID:      apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/rabbitmq@sha256:86fa2b761fc3a71a2b73090d7e45ad820f611fc829c1cb8cf087e09258fb65c1
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /kubeblocks/lorry
      --port
      3501
      --grpcport
      50001
      --config-path
      /kubeblocks/config/lorry/components/
    State:          Running
      Started:      Wed, 24 Dec 2025 12:15:11 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Startup:   tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      rabbitmq-snkjbr-rabbitmq-env      ConfigMap  Optional: false
      rabbitmq-snkjbr-rabbitmq-rsm-env  ConfigMap  Optional: false
    Environment:
      RABBITMQ_DEFAULT_USER:          <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      RABBITMQ_DEFAULT_PASS:          <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      KB_POD_UID:                      (v1:metadata.uid)
      KB_NAMESPACE:                   default (v1:metadata.namespace)
      KB_SA_NAME:                      (v1:spec.serviceAccountName)
      KB_NODENAME:                     (v1:spec.nodeName)
      KB_HOST_IP:                      (v1:status.hostIP)
      KB_POD_IP:                       (v1:status.podIP)
      KB_POD_IPS:                      (v1:status.podIPs)
      KB_HOSTIP:                       (v1:status.hostIP)
      KB_PODIP:                        (v1:status.podIP)
      KB_PODIPS:                       (v1:status.podIPs)
      KB_POD_FQDN:                    $(KB_POD_NAME).rabbitmq-snkjbr-rabbitmq-headless.$(KB_NAMESPACE).svc
      KB_BUILTIN_HANDLER:             custom
      KB_SERVICE_USER:                <set to the key 'username' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_SERVICE_PASSWORD:            <set to the key 'password' in secret 'rabbitmq-snkjbr-rabbitmq-account-root'>  Optional: false
      KB_SERVICE_PORT:                4369
      KB_DATA_PATH:                   /var/lib/rabbitmq
      KB_ACTION_COMMANDS:             {"memberLeave":["/bin/bash","-c","#!/bin/bash\n\n\nis_node_deleted() {\n    local disk_nodes_str=$(echo \"$1\" | awk '/Disk Nodes/{flag=1;next} /^$/{flag++} {if(NF\u003e0 \u0026\u0026 flag==2){print}}')\n    while read -r line; do\n        if $(echo \"$line\" | grep -q \"$KB_LEAVE_MEMBER_POD_NAME\"); then\n            return 1\n        fi\n    done \u003c\u003c\u003c \"$disk_nodes_str\"\n    return 0\n}\n\ncleanup() {\n    echo \"Cleaning up...\"\n    rm -f /tmp/member_leave.lock\n}\n\nget_target_node() {\n    # get the list of running nodes\n    RUNNING_NODES=$(echo \"$1\" | grep -A 3 \"Running Nodes\" | tail -n +3 | grep 'rabbit@')\n\n    while read -r line; do\n        if [ ! -z \"$line\" ]; then\n            NODES+=(\"$line\")\n        fi\n    done \u003c\u003c\u003c \"$RUNNING_NODES\"\n\n    # found the target node to execute forget_cluster_node\n    TARGET_NODE=\"\"\n    for NODE in \"${NODES[@]}\"; do\n        if [[ \"$NODE\" != \"$LEAVE_NODE\" ]]; then\n            TARGET_NODE=$NODE\n            break\n        fi\n    done\n\n    if [[ -z \"$TARGET_NODE\" ]]; then\n        echo \"no target node found to execute forget_cluster_node.\"\n        return 1\n    fi\n    echo \"$TARGET_NODE\"\n}\n\n# if test by shellspec include, just return 0\nif [ \"${__SOURCED__:+x}\" ]; then\n  return 0\nfi\n\nset -ex\nif [[ -z \"$KB_LEAVE_MEMBER_POD_NAME\" ]]; then\n    echo \"no leave member name provided\"\n    exit 1\nfi\n\nif [[ -f /tmp/member_leave.lock ]]; then\n    echo \"member_leave.sh is already running\"\n    exit 1\nfi\n\nCURRENT_POD_NAME=$(echo \"${RABBITMQ_NODENAME}\"|grep -oP '(?\u003c=rabbit@).*?(?=\\.)')\nif [[ -f /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success ]]; then\n    echo \"member_leave.sh is already leave success\"\n    # if the current pod is the leave member pod, exit directly without delete the success file, because the leave member can't execute cluster_status anymore after leave the cluster.\n    if [[ \"$CURRENT_POD_NAME\" == \"$KB_LEAVE_MEMBER_POD_NAME\" ]]; then\n        exit 0\n    fi\n    rm -f /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n    exit 0\nfi\n\n\ntouch /tmp/member_leave.lock\n# Define the cleanup function\n\n# Set the trap to call the cleanup function on script exit\ntrap cleanup EXIT\n\n# the node to leave the cluster\nLEAVE_NODE=\"${RABBITMQ_NODENAME/$CURRENT_POD_NAME/$KB_LEAVE_MEMBER_POD_NAME}\"\n\n# the output of rabbitmqctl cluster_status\nCLUSTER_STATUS=$(rabbitmqctl cluster_status --formatter table)\n\nif is_node_deleted \"$CLUSTER_STATUS\"; then\n    echo \"Node $KB_LEAVE_MEMBER_POD_NAME has been deleted.\"\n    touch /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n    exit 0\nfi\n\n\nTARGET_NODE=$(get_target_node \"$CLUSTER_STATUS\")\nif [[ $? -ne 0 ]]; then\n    echo \"no target node found to execute forget_cluster_node.\"\n    exit 1\nfi\n\n# execute forget_cluster_node on the target node\nrabbitmqctl -n $LEAVE_NODE stop_app\nrabbitmqctl -n $TARGET_NODE forget_cluster_node $LEAVE_NODE\n\ntouch /tmp/${KB_LEAVE_MEMBER_POD_NAME}_leave.success\n\nif [[ $? -eq 0 ]]; then\n    echo \"Leave member success: $LEAVE_NODE.\"\nelse\n    echo \"leave member failed: $LEAVE_NODE.\"\n    exit 1\nfi\n"]}
      MY_POD_NAME:                    rabbitmq-snkjbr-rabbitmq-0 (v1:metadata.name)
      MY_POD_NAMESPACE:               default (v1:metadata.namespace)
      SERVICE_PORT:                   15692
      RABBITMQ_MNESIA_BASE:           /var/lib/rabbitmq/mnesia
      RABBITMQ_LOG_BASE:              /var/lib/rabbitmq/logs
      K8S_SERVICE_NAME:               $(KB_CLUSTER_COMP_NAME)-headless
      RABBITMQ_ENABLED_PLUGINS_FILE:  /var/lib/rabbitmq/enabled_plugins
      RABBITMQ_USE_LONGNAME:          true
      RABBITMQ_NODENAME:              rabbit@$(KB_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
      K8S_HOSTNAME_SUFFIX:            .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
    Mounts:
      /etc/localtime from timezone (ro)
      /etc/rabbitmq/conf.d/12-kubeblocks.conf from rabbitmq-config (rw,path="rabbitmq.conf")
      /etc/rabbitmq/enabled_plugins from rabbitmq-config (rw,path="enabled_plugins")
      /kubeblocks from kubeblocks (rw)
      /root/erlang.cookie from rabbitmq-config (rw,path=".erlang.cookie")
      /usr/share/zoneinfo from zoneinfo (ro)
      /var/lib/rabbitmq from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zwrw8 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  File
  zoneinfo:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/share/zoneinfo
    HostPathType:  Directory
  rabbitmq-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-snkjbr-rabbitmq-config
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-rabbitmq-snkjbr-rabbitmq-0
    ReadOnly:   false
  kubeblocks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-zwrw8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                  From     Message
  ----     ------             ----                 ----     -------
  Warning  FailedPreStopHook  100s (x15 over 71m)  kubelet  PreStopHook failed

logs error pod

kubectl logs rabbitmq-snkjbr-rabbitmq-0 --previous 
Defaulted container "rabbitmq" out of: rabbitmq, lorry, init-lorry (init)
2025-12-24 04:50:15.261207+00:00 [notice] <0.45.0> Application syslog exited with reason: stopped
2025-12-24 04:50:15.263349+00:00 [notice] <0.209.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
{exit,{shutdown,{gen_server,call,[application_controller,{start_application,rabbit,transient},infinity]}},[{gen_server,call,3,[{file,"gen_server.erl"},{line,1222}]},{application_controller,call,2,[{file,"application_controller.erl"},{line,509}]},{application,enqueue_or_start_app,6,[{file,"application.erl"},{line,419}]},{application,enqueue_or_start,6,[{file,"application.erl"},{line,384}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,359}]},{rabbit,'-start_it/1-fun-0-',1,[{file,"rabbit.erl"},{line,440}]},{timer,tc,2,[{file,"timer.erl"},{line,595}]},{rabbit,start_it,1,[{file,"rabbit.erl"},{line,436}]}]}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions