Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the ulimit changes needed for running fleet-server in larger environments! #1568

Open
jdixon-86 opened this issue Jun 17, 2022 · 4 comments
Labels
Team:Docs Label for the Observability docs team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@jdixon-86
Copy link

Describe the enhancement:
Please document the required changes for the ulimit settings, especially on Ubuntu in order to run the fleet server in larger environment. If people do not do this then their fleet servers can stop responding due to having too many open files

Describe a specific use case for the enhancement or feature:
What I have ran into with nearly 4,000 agents and two fleet servers behind a load balancer is hitting a open file limit in Ubuntu with the fleet server. Currently there are no documented ulimit changes for fleet server but there is for Elasticsearch, however the Elasticsearch changes do not apply because Elasticsearch runs under a different user (named elasticsearch) and fleet-server will run under the user root.

To see the current limitation find the PID of the fleet-server process:
pidof fleet-server

Then run the following to see the output of the SOFT limitation:
prlimit -n --pid=[pid]

What you will notice is the soft limitation will still remain at 1024 even if you modify the /etc/systemd/system.conf file.

RESOURCE DESCRIPTION                SOFT   HARD UNITS
NOFILE   max number of open files 1024 524288 files

In order to fix this you must edit /etc/systemd/system.conf and change the following line to something higher:
DefaultLimitNOFILE=262144:524288

Editing the /etc/pam.d/login file to include the pam_limits.so does not fix the SOFT limit problem. This should be properly documented so time is not wasted. Ticket 00975556

@ph ph added the Team:Elastic-Agent Label for the Agent team label Jul 25, 2022
@AndrewMcQuerry
Copy link

👍 Thanks @jdixon-86. This information just saved our bacon in our "large" environment.

Would love if the documentation could be updated to account for this (or at least someone from Elastic reply with an alternative work-around editing /etc/systemd/system.conf is not required.

@cmacknz cmacknz added the Team:Docs Label for the Observability docs team label Nov 7, 2023
@jlind23
Copy link
Contributor

jlind23 commented Sep 26, 2024

@kilfoyle isn't this something we should rather transfer to the ingest-doc repository?
cc @nimarezainia

@kilfoyle
Copy link

@jlind23 Certainly, this can be transferred to the ingest-docs repo.

One request: Can we please have a developer assigned to it along with me? I'd really need more detail on exactly what to change in the docs. I don't want to accidentally give incorrect advice, and it seems like something we may need to test before documenting.

@cmacknz
Copy link
Member

cmacknz commented Sep 26, 2024

This might be something we can add to Fleet Server so it does it automatically using https://pkg.go.dev/syscall#Setrlimit, which should work as long as Fleet Server is privileged/root or has CAP_SYS_RESOURCE.

@cmacknz cmacknz added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed Team:Elastic-Agent Label for the Agent team labels Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Docs Label for the Observability docs team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

6 participants