Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes the api-pod does not respond any more #1790

Open
FamilieRo opened this issue Dec 19, 2024 · 10 comments
Open

Sometimes the api-pod does not respond any more #1790

FamilieRo opened this issue Dec 19, 2024 · 10 comments
Milestone

Comments

@FamilieRo
Copy link

FamilieRo commented Dec 19, 2024

From time to time the api pod stops to respond. So everything like attaching volumes, cli etc. gets a timeout.
e.g.
kubectl mayastor get volumes -n openebs
Failed to list volumes. Error error in request: request timed out

In the logs of the api pod you will find something like

.[2m2024-12-19T13:08:39.383948Z.[0m .[31mERROR.[0m .[1;31mgrpc::operations::volume::client.[0m.[31m: .[1;31merror.[0m.[31m: Unavailable: status: Unavailable, message: "error trying to connect: deadline has elapsed", details: [], metadata: MetadataMap { headers: {} }.[0m

I have attached a full log file for you to analyse.

Removing the api pod brings up a new one and everything works directly.

The error occurs regardless of whether workload happens on the cluster or not.

openebs-api-rest-676998984d-svxhr-api-rest.log

@tiagolobocastro
Copy link
Contributor

Could you please try to increase the cpu limits for the rest just in case?
Example:
mayastor.apis.rest.resources.limits.cpu=500m

And could you also share logs from agents-core pod, agent-core container?

@FamilieRo
Copy link
Author

I have increased the cpu limit. At the moment no more timeouts. I will observe the behaviour.

However, I'm on holiday for a few days and will upload more logs if necessary.

@tiagolobocastro
Copy link
Contributor

Would you be able to try another work around?
Leaving the resource limits as is but modifying the rest deployment args to include --max--workers=2 ?

@FamilieRo
Copy link
Author

Hi!
We had no more problems with the API so far.
Nevertheless I left the CPU limit as is and included the max-workers in the args. The pod is running fine, will observe it.
Thanks!

@tiagolobocastro
Copy link
Contributor

Sorry I probably wrote my previous message incorrectly. By as is I had meant, as it is by default of 100m but with max workers of 2.
Thank you!

@FamilieRo
Copy link
Author

OK, sorry!
Changed the limit back to 100m.

@Abhinandan-Purkait
Copy link
Member

@tiagolobocastro I believe the fix for this is increasing the resources and modifying the workers. We also have added a liveness probe.

@Abhinandan-Purkait Abhinandan-Purkait added this to the Mayastor v2.7 milestone Jan 19, 2025
@FamilieRo
Copy link
Author

@tiagolobocastro
I did an upgrade to version 2.7.2 last week. So the args and the limits are set to the defaults and everythings seems to run smoothly. No problems at all.

@tiagolobocastro
Copy link
Contributor

That may be because a liveness probe has been added to the api-rest so it gets restarted if it becomes unresponsive. You might see this on the pod restarts.

Did you have any luck with max-workers as 2?

@FamilieRo
Copy link
Author

Hi,

no restarts since 01/16:

Image

With version 2.7.1 the arg for max-workers=2 worked well. But as far as I can see the problem is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants