Skip to content

[BUG] hadoop datanode scale-in error datanode is not decommissioned #9912

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 1.0.2-beta.18
kbcli: 1.0.2-beta.0

➜  ~ helm get notes -n kb-system kb-addon-hadoop
NOTES:
Release Information:
  Commit ID: "0c2666899d8b93a34b826d224fb26b1d3d9a6e96"
  Commit Time: "2025-11-07 16:50:56 +0800"
  Release Branch: "v1.0.2-beta.18"
  Release Time:  "2025-12-05 12:18:15 +0800"
  Enterprise: "false"

ERROR leave member at scale-in error {"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "841c0137-492f-491b-a8cc-a0d7a32eb769", "component": {"name":"hadoop-sstpe-datanode","namespace":"default"}, "error": "requeue after: 1s as: [action: memberLeave, error: exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed: action failed]"}

To Reproduce
Steps to reproduce the behavior:

  1. creste cluster
  2. scale-out datanode
kbcli cluster scale-out hadoop-sstpez --auto-approve --force=true --components datanode --replicas 1  --namespace default
  1. scale-in datanode
kbcli cluster scale-in hadoop-sstpez --auto-approve --force=true --components datanode --replicas 1  --namespace default
  1. See error
 kubectl get cluster hadoop-sstpez 
NAME            CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS    AGE
hadoop-sstpez   hadoop               Delete               Running   19m
➜  ~ 
➜  ~ kubectl get ops
NAME                                    TYPE                CLUSTER         STATUS    PROGRESS   AGE
hadoop-sstpez-horizontalscaling-t8vb9   HorizontalScaling   hadoop-sstpez   Running   0/1        5m14s
➜  ~ 
➜  ~ kubectl get pod
NAME                              READY   STATUS    RESTARTS   AGE
hadoop-sstpez-datanode-0          2/2     Running   0          9m39s
hadoop-sstpez-datanode-1          2/2     Running   0          9m39s
hadoop-sstpez-datanode-2          2/2     Running   0          9m39s
hadoop-sstpez-datanode-3          2/2     Running   0          6m9s
hadoop-sstpez-journalnode-0       1/1     Running   0          9m39s
hadoop-sstpez-namenode-0          2/2     Running   0          9m39s
hadoop-sstpez-namenode-1          2/2     Running   0          9m39s
hadoop-sstpez-nodemanager-0       1/1     Running   0          9m39s
hadoop-sstpez-resourcemanager-0   2/2     Running   0          9m39s
hadoop-sstpez-resourcemanager-1   2/2     Running   0          9m39s
hadoop-sstpez-zookeeper-0         2/2     Running   0          9m39s

logs kbagent

kubectl logs hadoop-sstpez-datanode-3 kbagent 
2025-12-05T18:20:35+08:00	INFO	create service Action	{"actions": "memberJoin,memberLeave"}
2025-12-05T18:20:35+08:00	INFO	create service Probe	{"probes": ""}
2025-12-05T18:20:35+08:00	INFO	create service Streaming	{"actions": ""}
2025-12-05T18:20:35+08:00	INFO	service Action started...
2025-12-05T18:20:35+08:00	INFO	service Probe started...
2025-12-05T18:20:35+08:00	INFO	service Streaming started...
2025-12-05T18:20:35+08:00	INFO	starting the HTTP server
2025-12-05T18:20:35+08:00	INFO	register service to server	{"service": "Action", "method": "POST", "uri": "/v1.0/action"}
2025-12-05T18:20:35+08:00	INFO	register service to server	{"service": "Probe", "method": "POST", "uri": "/v1.0/probe"}
2025-12-05T18:20:35+08:00	INFO	register service to server	{"service": "Streaming", "method": "POST", "uri": "/v1.0/streaming"}
2025-12-05T18:20:35+08:00	INFO	starting the streaming server
2025-12-05T18:21:19+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Normal, retry again\n: failed"}
2025-12-05T18:21:19+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 17276}
2025-12-05T18:21:26+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Normal, retry again\n: failed"}
2025-12-05T18:21:26+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7266}
2025-12-05T18:21:33+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Normal, retry again\n: failed"}
2025-12-05T18:21:33+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7212}
2025-12-05T18:21:40+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Normal, retry again\n: failed"}
2025-12-05T18:21:40+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7242}
2025-12-05T18:22:46+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:22:46+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7200}
2025-12-05T18:24:05+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:24:05+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7226}
2025-12-05T18:24:27+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:24:27+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7277}
2025-12-05T18:26:30+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:26:30+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7217}
2025-12-05T18:26:51+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:26:51+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7222}
2025-12-05T18:26:59+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:26:59+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7151}
2025-12-05T18:27:20+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:27:20+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7193}
2025-12-05T18:28:32+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:28:32+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7205}
2025-12-05T18:28:47+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:28:47+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7191}
2025-12-05T18:28:54+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:28:54+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7220}
2025-12-05T18:29:52+08:00	INFO	Action Executed	{"action": "memberLeave", "result": "exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed"}
2025-12-05T18:29:52+08:00	INFO	HTTP API Called	{"user-agent": "Go-http-client/1.1", "method": "POST", "path": "/v1.0/action", "status code": 200, "cost": 7195}

logs kb pod

kubectl logs -n kb-system kubeblocks-55cbf757bd-dmmsd --tail 50|grep hadoop-sstpez
Defaulted container "manager" out of: manager, tools (init)
2025-12-05T10:27:56.778Z	INFO	build error: requeue after: 1s as: [action: memberLeave, error: exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again
: failed: action failed]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "ce906449-c8f7-449a-b670-34aaf9bdcd0c", "component": {"name":"hadoop-sstpez-datanode","namespace":"default"}}
2025-12-05T10:27:56.784Z	INFO	reconcile object *v1.Component with action STATUS OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "ce906449-c8f7-449a-b670-34aaf9bdcd0c", "component": {"name":"hadoop-sstpez-datanode","namespace":"default"}}
2025-12-05T10:27:56.823Z	INFO	leave member at scaling-in	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "841c0137-492f-491b-a8cc-a0d7a32eb769", "component": {"name":"hadoop-sstpe-datanode","namespace":"default"}, "delete replicas": ["hadoop-sstpez-datanode-3"], "joined replicas": ["hadoop-sstpez-datanode-3"], "has member-leave action defined": true}
2025-12-05T10:28:04.109Z	ERROR	leave member at scale-in error	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "841c0137-492f-491b-a8cc-a0d7a32eb769", "component": {"name":"hadoop-sstpe-datanode","namespace":"default"}, "error": "requeue after: 1s as: [action: memberLeave, error: exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again\n: failed: action failed]"}
2025-12-05T10:28:04.109Z	INFO	build error: requeue after: 1s as: [action: memberLeave, error: exit code: 1, stderr: Datanode hadoop-sstpez-datanode-3.hadoop-sstpez-datanode-headless.default.svc.cluster.local is not decommissioned. Current status: Decommission, retry again
: failed: action failed]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "841c0137-492f-491b-a8cc-a0d7a32eb769", "component": {"name":"hadoop-sstpez-datanode","namespace":"default"}}
2025-12-05T10:28:04.118Z	INFO	reconcile object *v1.Component with action STATUS OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "841c0137-492f-491b-a8cc-a0d7a32eb769", "component": {"name":"hadoop-sstpez-datanode","namespace":"default"}}
2025-12-05T10:28:04.158Z	INFO	leave member at scaling-in	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"hadoop-sstpez-datanode","namespace":"default"}, "namespace": "default", "name": "hadoop-sstpez-datanode", "reconcileID": "65a87c40-5bed-441e-b1bf-b7a1ae26fd8a", "component": {"name":"hadoop-sstpe-datanode","namespace":"default"}, "delete replicas": ["hadoop-sstpez-datanode-3"], "joined replicas": ["hadoop-sstpez-datanode-3"], "has member-leave action defined": true}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions