Sometimes the api-pod does not respond any more #1790

FamilieRo · 2024-12-19T14:16:17Z

From time to time the api pod stops to respond. So everything like attaching volumes, cli etc. gets a timeout.
e.g.
kubectl mayastor get volumes -n openebs
Failed to list volumes. Error error in request: request timed out

In the logs of the api pod you will find something like

.[2m2024-12-19T13:08:39.383948Z.[0m .[31mERROR.[0m .[1;31mgrpc::operations::volume::client.[0m.[31m: .[1;31merror.[0m.[31m: Unavailable: status: Unavailable, message: "error trying to connect: deadline has elapsed", details: [], metadata: MetadataMap { headers: {} }.[0m

I have attached a full log file for you to analyse.

Removing the api pod brings up a new one and everything works directly.

The error occurs regardless of whether workload happens on the cluster or not.

openebs-api-rest-676998984d-svxhr-api-rest.log

tiagolobocastro · 2024-12-21T18:10:33Z

Could you please try to increase the cpu limits for the rest just in case?
Example:
mayastor.apis.rest.resources.limits.cpu=500m

And could you also share logs from agents-core pod, agent-core container?

FamilieRo · 2024-12-23T09:57:14Z

I have increased the cpu limit. At the moment no more timeouts. I will observe the behaviour.

However, I'm on holiday for a few days and will upload more logs if necessary.

tiagolobocastro · 2025-01-07T23:33:06Z

Would you be able to try another work around?
Leaving the resource limits as is but modifying the rest deployment args to include --max--workers=2 ?

FamilieRo · 2025-01-09T08:12:44Z

Hi!
We had no more problems with the API so far.
Nevertheless I left the CPU limit as is and included the max-workers in the args. The pod is running fine, will observe it.
Thanks!

tiagolobocastro · 2025-01-09T09:42:19Z

Sorry I probably wrote my previous message incorrectly. By as is I had meant, as it is by default of 100m but with max workers of 2.
Thank you!

FamilieRo · 2025-01-09T12:11:16Z

OK, sorry!
Changed the limit back to 100m.

Abhinandan-Purkait · 2025-01-19T15:17:33Z

@tiagolobocastro I believe the fix for this is increasing the resources and modifying the workers. We also have added a liveness probe.

FamilieRo · 2025-01-21T06:58:08Z

@tiagolobocastro
I did an upgrade to version 2.7.2 last week. So the args and the limits are set to the defaults and everythings seems to run smoothly. No problems at all.

tiagolobocastro · 2025-01-21T09:46:40Z

That may be because a liveness probe has been added to the api-rest so it gets restarted if it becomes unresponsive. You might see this on the pod restarts.

Did you have any luck with max-workers as 2?

FamilieRo · 2025-01-21T09:52:31Z

Hi,

no restarts since 01/16:

With version 2.7.1 the arg for max-workers=2 worked well. But as far as I can see the problem is gone.

Abhinandan-Purkait added this to the Mayastor v2.7 milestone Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes the api-pod does not respond any more #1790

Sometimes the api-pod does not respond any more #1790

FamilieRo commented Dec 19, 2024 •

edited

Loading

tiagolobocastro commented Dec 21, 2024

FamilieRo commented Dec 23, 2024

tiagolobocastro commented Jan 7, 2025

FamilieRo commented Jan 9, 2025

tiagolobocastro commented Jan 9, 2025

FamilieRo commented Jan 9, 2025

Abhinandan-Purkait commented Jan 19, 2025

FamilieRo commented Jan 21, 2025

tiagolobocastro commented Jan 21, 2025

FamilieRo commented Jan 21, 2025

Sometimes the api-pod does not respond any more #1790

Sometimes the api-pod does not respond any more #1790

Comments

FamilieRo commented Dec 19, 2024 • edited Loading

tiagolobocastro commented Dec 21, 2024

FamilieRo commented Dec 23, 2024

tiagolobocastro commented Jan 7, 2025

FamilieRo commented Jan 9, 2025

tiagolobocastro commented Jan 9, 2025

FamilieRo commented Jan 9, 2025

Abhinandan-Purkait commented Jan 19, 2025

FamilieRo commented Jan 21, 2025

tiagolobocastro commented Jan 21, 2025

FamilieRo commented Jan 21, 2025

FamilieRo commented Dec 19, 2024 •

edited

Loading