HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead #7746

sodonnel · 2025-01-24T12:29:27Z

What changes were proposed in this pull request?

When RM schedules a delete of a container on a datanode, it should keep track of that delete until either:

A ICR / FCR is received which confirms the container is removed.
The datanode goes dead.

Currently, RM expires the delete attempt after 10 minutes and while it should resend the command to the same datanode on retry it may not (eg HDDS-12115) or in other scenarios that cause the datanode ordering to change.

With this change, the expiry still occurs and the command can get dropped on the datanode, but in the ContainerReplicaPendingOps expiry thread, it no long removes the pending delete from the pending list. Instead it will trigger a notification to RM which will then resend the same command with a new deadline until it has been confirmed as successful. RM will subscribe to the notifications from ContainerReplicaPendingOps and re-run any expired delete commands.

This is to combat a recent problem we experienced where delete command hung for a very long time and RM issued new deletes to other DNs, resulting in all replicas of a container getting removed unexpectedly.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12127

How was this patch tested?

Various unit tests modified and added. Manually tested the deletes are resent in docker but modifying the DN code to drop all delete container commands. Logs from docker compose:

scm-1       | 2025-01-24 12:40:16,891 [OverReplicatedProcessor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 0, force: true] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2025-01-24T12:37:21.566460928Z, pipelineID=PipelineID=5c31c1f3-21c7-4595-b851-282eef3fa642, owner=omServiceIdDefault} to fdcb66e0-8400-4c7e-b110-6aca4f8ab610(ozone-datanode-2.ozone_default/172.20.0.7) with datanode deadline 1737722986891 and scm deadline 1737723016891

scm-1       | 2025-01-24 12:50:46,790 [ExpiredContainerReplicaOpScrubber] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 0, force: true] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2025-01-24T12:37:21.566460928Z, pipelineID=PipelineID=5c31c1f3-21c7-4595-b851-282eef3fa642, owner=omServiceIdDefault} to fdcb66e0-8400-4c7e-b110-6aca4f8ab610(ozone-datanode-2.ozone_default/172.20.0.7) with datanode deadline 1737723616789 and scm deadline 1737723646789

Notice that the first delete command was send by the OverReplicatedProcessor thread. The followup was sent by ExpiredContainerReplicaOpScrubber after the timeout expired proving the expiry notifications are working.

Original deadline 1737722986891 = Friday, 24 January 2025 12:49:46
New deadline 1737723616789 = Friday, 24 January 2025 13:00:16

The deadline was advanced as expected.

… and index

…ify subscribers

adoroszlai

Thanks @sodonnel for the patch, LGTM.

siddhantsangwan

LGTM, thanks for the patch.

adoroszlai · 2025-01-28T10:21:49Z

Thanks @sodonnel for the patch, @siddhantsangwan for the review.

…ete is confirmed or node is dead (apache#7746) (cherry picked from commit 04f6255) Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestECContainerReplicaCount.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManagerScenarios.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java

…ete is confirmed or node is dead (apache#7746)

* CDPD-78092. HDDS-12114. Prevent delete commands running after a long lock wait and send ICR earlier (apache#7726) (cherry picked from commit b6cc4af) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java Change-Id: I62ffb7203f2af5be2901ef923f333de53bbc3656 * CDPD-78149. HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded (apache#7728) (cherry picked from commit efd8adc) Conflicts: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestRatisOverReplicationHandler.java Change-Id: Ia3d54917c7c488a9b706f6ce941e7f466746d3bd * CDPD-78286. HDDS-12135. Set RM default deadline to 12 minutes and datanode offset to 6 minutes (apache#7747) (cherry picked from commit d7616ec) Change-Id: I36f237705f5a94d453bcec72c32056c2be8f38ba * CDPD-78213. HDDS-12127. RM should not expire pending deletes, but retry until delete is confirmed or node is dead (apache#7746) (cherry picked from commit 04f6255) Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestMoveManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestECContainerReplicaCount.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManagerScenarios.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java Change-Id: Ic01591f72706f2473c63dd2e44c3f2a94fb70d43 --------- Co-authored-by: Stephen O'Donnell <[email protected]>

S O'Donnell added 5 commits January 24, 2025 12:00

Overrwrite duplicate ops if another scheduled for same opType, target…

786daef

… and index

Add SCMCommand to ContainerReplicaOp and ContainerReplicaPendingOps

6893400

Do not remove deletes from pending ops on expiry, but continue to not…

11bbf82

…ify subscribers

Add completeOp method / interface to RM

9c1e0c0

Subscribe RM to notifications from ContainerReplicaPendingOps

c270e83

sodonnel requested a review from siddhantsangwan January 24, 2025 12:29

Fix failing test by capturing logs earlier

49e3f7a

adoroszlai approved these changes Jan 27, 2025

View reviewed changes

siddhantsangwan approved these changes Jan 28, 2025

View reviewed changes

adoroszlai merged commit 04f6255 into apache:master Jan 28, 2025
42 checks passed

nandakumar131 pushed a commit to nandakumar131/ozone that referenced this pull request Feb 10, 2025

HDDS-12127. RM should not expire pending deletes, but retry until del…

57e3759

…ete is confirmed or node is dead (apache#7746)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead #7746

HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead #7746

Uh oh!

sodonnel commented Jan 24, 2025 •

edited

Loading

Uh oh!

adoroszlai left a comment

Uh oh!

siddhantsangwan left a comment

Uh oh!

Uh oh!

adoroszlai commented Jan 28, 2025

Uh oh!

Uh oh!

HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead #7746

HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead #7746

Uh oh!

Conversation

sodonnel commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adoroszlai commented Jan 28, 2025

Uh oh!

Uh oh!

sodonnel commented Jan 24, 2025 •

edited

Loading