Skip to content

The timers generate internal errors in Dapr 1.15.3 #1522

Open
@vlardn

Description

@vlardn

Expected Behavior

Actor timers should not generate any internal errors in the app or the daprd sidecar.

Actual Behavior

Our .NET Core 9 app has a few actor timers and running in Azure Kubernetes Service with 2 replicas.

After upgrading to Dapr from 1.14.4 to 1.15.3 and after some time passed the app (.NET Core 9) started to log errors (from overridden Actor.OnActorMethodFailedAsync method) with the the same frequency as configured timers but despite these errors, the timer methods in the app still get called as expected:

2025-04-24 13:40:04.115 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync finished
2025-04-24 13:40:04.304 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync starting
2025-04-24 13:40:04.306 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync failed: Object reference not set to an instance of an object.; System.NullReferenceException: Object reference not set to an instance of an object.
   at Dapr.Actors.Runtime.ActorManager.<>c__DisplayClass25_0.<<FireTimerAsync>g__RequestFunc|0>d.MoveNext()
--- End of stack trace from previous location ---
   at Dapr.Actors.Runtime.ActorManager.DispatchInternalAsync[T](ActorId actorId, ActorMethodContext actorMethodContext, Func`3 actorFunc, CancellationToken cancellationToken)
2025-04-24 13:40:05.001 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync starting
2025-04-24 13:40:05.067 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync finished
2025-04-24 13:40:05.309 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync starting
2025-04-24 13:40:05.311 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync failed: Object reference not set to an instance of an object.; System.NullReferenceException: Object reference not set to an instance of an object.
   at Dapr.Actors.Runtime.ActorManager.<>c__DisplayClass25_0.<<FireTimerAsync>g__RequestFunc|0>d.MoveNext()
--- End of stack trace from previous location ---
   at Dapr.Actors.Runtime.ActorManager.DispatchInternalAsync[T](ActorId actorId, ActorMethodContext actorMethodContext, Func`3 actorFunc, CancellationToken cancellationToken)
2025-04-24 13:40:06.001 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync starting
2025-04-24 13:40:06.027 | backend-55497cb86d-mgnkl | Actor TimerMethod FireTimerAsync finished

The daprd sidecar logs on another pod backend-55497cb86d-c6hfk:

time="2025-04-24T10:44:00.029051165Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:00.038464821Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:03.028819788Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:03.029307406Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:04.170616081Z" level=error msg="Error executing timer: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime.actors.timers.inmemory type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:04.18271453Z" level=warning msg="Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:04.182880036Z" level=debug msg="Error for operation DaprBuiltInActorNotFoundRetries was: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:06.028734569Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:06.029201586Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:09.028859858Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:09.029491281Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:09.245511002Z" level=error msg="Error executing timer: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime.actors.timers.inmemory type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:09.254489835Z" level=warning msg="Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:09.254837948Z" level=debug msg="Error for operation DaprBuiltInActorNotFoundRetries was: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:12.028803257Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:12.02966379Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:14.418807174Z" level=error msg="Error executing timer: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime.actors.timers.inmemory type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:14.426276255Z" level=warning msg="Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:14.426464462Z" level=debug msg="Error for operation DaprBuiltInActorNotFoundRetries was: rpc error: code = Unknown desc = error from actor service: (500) " app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.runtime type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:15.028793672Z" level=debug msg="Probing app health" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2
time="2025-04-24T10:44:15.03086115Z" level=debug msg="App health probe successful: true" app_id=myapp instance=backend-55497cb86d-c6hfk scope=dapr.apphealth type=log ver=1.15.3-msft.2

As seen in the app logs, the actors with timers are located on the backend-55497cb86d-mgnkl pod but daprd timer rpc errors occurs on another pod - backend-55497cb86d-c6hfk.

I think the problem occurs because when Dapr moves an actor to another pod, it still keeps the timer on the old pod (in this case, backend-55497cb86d-c6hfk), which continues attempting to invoke the actor timer method on the new pod.

Deleting pod backend-55497cb86d-c6hfk (and auto-creating new one by k8s) helps to stop all these errors.

Steps to Reproduce the Problem

Release Note

RELEASE NOTE: FIX Bug in runtime.

Metadata

Metadata

Assignees

Labels

P0kind/bugSomething isn't working

Type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions