Skip to content

Possible premature deletion in History Scavenger #9021

@phuongdnguyen

Description

@phuongdnguyen

Expected Behavior

The scavenger should not perform history garbage collection on workflow that have mutable state exist in the database.

Actual Behavior

I have a case where a workflow stucks, retrieving the workflow through Temporal WebUI shows "workflow execution history not found".

Then I looking at the log and found an DataLoss error (in-contiguous event branch) logged from Frontend Server.

Then I found a tombstone for the record with the workflow runID as tree_id in the DB, indicate that Temporal executed a deletion against this workflow's history.

The log for Temporal also show "deleting history garbage" matches the time of the tombstone.

At this point, it clearly shown that Temporal was execute a history clean up against the running workflow. Please note that we can still get the mutable state from DB at this time (2 records: executions and current executions), only the history is lost.

I did some review on the component that performed the deletion: history scavenger. And found out that there are 2 safeguards for a deletion to be executed:

  • history branch fork time is at least historyScannerMinDataAge
  • describe mutable state returns either serviceerror.NotFound or serviceerror.NamespaceNotFound.

The safeguard #1 is satisfied given our workflow is quite old.
The safeguard#2 I think it's the cause. Another key point is we also have execution data cleaner enabled. So I suspect since we can still get the mutable state, the problem lies somewhere in the namespace registry. I reviewed the namespace registry and found this line can be problematic, it swallow any error returned from persistence and convert it to a NamespaceNotFound error, make safeguard #2 to be satisfied also.

Steps to Reproduce the Problem

Specifications

  • Version: 1.22.0, but I think the latest code still have this issue.
  • Platform:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions