Description
I've already gone through the troubleshooting guide, but it doesn't provide a solution for our specific case.
Our workflow is designed to scan the backlog table and initiate sub-orchestrator workflows to process backlog items. To manage iterations efficiently, we use the eternal orchestration pattern with ContinueAsNew, preventing performance issues associated with infinite loops. Additionally, we introduce delays between iterations using the IDurableOrchestrationContext.CreateTimer method.
using var pullingJobCts = new CancellationTokenSource();
await context.CreateTimer(context.CurrentUtcDateTime.Add(TimeSpan.FromMinutes(1)), pullingJobCts.Token);
We've been experiencing this issue for quite some time. Over the past two weeks, out of thousands of executions, approximately a dozen workflow instances have become stuck. Using the VS Code extension, I can see that the last recorded operation for these stuck workflows is TimerCreated.
I was finally able to capture messages for these instances in the control queue
Instance ID Message ID Deqeue count
38bf0b8588db420e8c5b992d47a8e735:1705 b7bde336-9125-4859-8dd7-d259dd0d4204 4549
58dbd6970a4e4d8895a03af0a8fa1fad:998 bba78a46-7222-483a-87e1-74f76d0e17df 6944
their dequeue count values suggest that after the delay, the expected message to resume computation was never received, since 17 and 22 Jan. I've attached the messages body. You can see the event type is TimerFiredEvent.
b7bde336-9125-4859-8dd7-d259dd0d4204.json
bba78a46-7222-483a-87e1-74f76d0e17df.json
In the logs I found errors
2025-01-31T21:45:32Z [Error] An unexpected failure occurred while processing instance '58dbd6970a4e4d8895a03af0a8fa1fad:998': DurableTask.AzureStorage.Storage.DurableTaskStorageException: An error occurred while communicating with Azure Storage
---> Azure.RequestFailedException: The specified blob does not exist.
error-38bf0b8588db420e8c5b992d47a8e735-1705.txt
error-58dbd6970a4e4d8895a03af0a8fa1fad-998.txt
Thanks,
Andrei