Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of file descriptors used in RHEL 9 #198

Open
razorbladex401 opened this issue Feb 5, 2024 · 2 comments
Open

Large number of file descriptors used in RHEL 9 #198

razorbladex401 opened this issue Feb 5, 2024 · 2 comments

Comments

@razorbladex401
Copy link

razorbladex401 commented Feb 5, 2024

Currently experiencing an interesting issue where opendkim is exhausting all its file descriptors leading to postfix milter reject errors. This recently started happening after having the systems running for several months in RHEL 9. From what I've been able to tell there seems to be some issue with opendkim when on a system that's using sssd 2.9.1-4.el9_3.1 or greater. I have already opened a case with Red Hat but since opendkim is not part of their repo's they can't say if it's a bug with sssd or opendkim.

For some background:
We're using several domains to send from with around 50+ keys. Using, ss -anp | grep 'opendkim' | sed 's/ *$//', we can see streams grow after each email that get's signed. Below is a small example. Generally, we only really see maybe 11 streams at a time and they don't stay open long.

`
u_str ESTAB 0 0 * 53817391 * 0 users:(("opendkim",pid=1856103,fd=77))
u_str ESTAB 0 0 * 53832275 * 0 users:(("opendkim",pid=1856103,fd=218))
u_str ESTAB 0 0 * 53824876 * 0 users:(("opendkim",pid=1856103,fd=124))
u_str ESTAB 0 0 * 53824756 * 0 users:(("opendkim",pid=1856103,fd=133))
u_str ESTAB 0 0 * 53836020 * 0 users:(("opendkim",pid=1856103,fd=255))
u_str ESTAB 0 0 * 53838661 * 0 users:(("opendkim",pid=1856103,fd=285))
u_str ESTAB 0 0 * 53808986 * 0 users:(("opendkim",pid=1856103,fd=30))
u_str ESTAB 0 0 * 53835631 * 0 users:(("opendkim",pid=1856103,fd=270))
u_str ESTAB 0 0 * 53832887 * 0 users:(("opendkim",pid=1856103,fd=245))
u_str ESTAB 0 0 * 53838157 * 0 users:(("opendkim",pid=1856103,fd=283))
u_str ESTAB 0 0 * 53840434 * 0 users:(("opendkim",pid=1856103,fd=305))
u_str ESTAB 0 0 * 53830151 * 0 users:(("opendkim",pid=1856103,fd=182))
u_str ESTAB 0 0 * 53830225 * 0 users:(("opendkim",pid=1856103,fd=187))
u_str ESTAB 0 0 * 53830723 * 0 users:(("opendkim",pid=1856103,fd=191))
u_str ESTAB 0 0 * 53828693 * 0 users:(("opendkim",pid=1856103,fd=176))
u_str ESTAB 0 0 * 53827800 * 0 users:(("opendkim",pid=1856103,fd=171))
u_str ESTAB 0 0 * 53825776 * 0 users:(("opendkim",pid=1856103,fd=146))
u_str ESTAB 0 0 * 53834975 * 0 users:(("opendkim",pid=1856103,fd=259))
u_str ESTAB 0 0 * 53808916 * 0 users:(("opendkim",pid=1856103,fd=23))
u_str ESTAB 0 0 * 53810481 * 0 users:(("opendkim",pid=1856103,fd=33))
u_str ESTAB 0 0 * 53832456 * 0 users:(("opendkim",pid=1856103,fd=234))
u_str ESTAB 0 0 * 53810634 * 0 users:(("opendkim",pid=1856103,fd=47))
u_str ESTAB 0 0 * 53835245 * 0 users:(("opendkim",pid=1856103,fd=260))
u_str ESTAB 0 0 * 53836391 * 0 users:(("opendkim",pid=1856103,fd=265))
u_str ESTAB 0 0 * 53827932 * 0 users:(("opendkim",pid=1856103,fd=175))
u_str ESTAB 0 0 * 53826487 * 0 users:(("opendkim",pid=1856103,fd=163))
tcp LISTEN 0 4096 127.0.0.1:8891 0.0.0.0:* users:(("opendkim",pid=1856103,fd=3))

`

After some time there will be several hundred (if not more) open streams. Eventually it'll grow and we'll start seeing milter-rejects in postfix logs. We can get things working again by restarting opendkim.

warning: milter inet:127.0.0.1:8891: can't read SMFIC_OPTNEG reply packet header: Connection reset by peer warning: milter inet:127.0.0.1:8891: read error in initial handshake NOQUEUE: milter-reject: CONNECT from server [IP]: 451 4.7.1 Service unavailable - try again later; proto=SMTP

This only started after we updated sssd to version 2.9.1-4.el9_3.1. When we downgrade to 2.9.1-4.el9_3 it doesn't happen. Like stated before Redhat is not able to confirm if this is a bug with sssd or opendkim since opendkim is not from their repos. We use the version from EPEL.

There is a similar post in the Rocky Linux forums with some additional information that seems to match the same issue we are seeing, https://forums.rockylinux.org/t/rocky-9-3-opendkim-latest-sssd-packages/12327.

I believe this is tied to nss. When enabling debug logging with sssd I can see that opendkim shows up as a client.

(2024-01-31 10:16:01): [nss] [accept_fd_handler] (0x0400): [CID#3] Client [cmd /usr/sbin/opendkim][uid 34188][0x562038d3c500][23] connected! (2024-01-31 10:16:01): [nss] [sss_cmd_get_version] (0x0200): [CID#3] Received client version [1]. (2024-01-31 10:16:01): [nss] [sss_cmd_get_version] (0x0200): [CID#3] Offered version [1]. (2024-01-31 10:16:01): [nss] [cache_req_set_plugin] (0x2000): [CID#3] CR #2: Setting "Enumerate users" plugin (2024-01-31 10:16:01): [nss] [cache_req_send] (0x0400): [CID#3] CR #2: REQ_TRACE: New request [CID #3] 'Enumerate users' (2024-01-31 10:16:01): [nss] [cache_req_select_domains] (0x0400): [CID#3] CR #2: Performing a multi-domain search (2024-01-31 10:16:01): [nss] [cache_req_search_domains] (0x0400): [CID#3] CR #2: Search will bypass the cache and check the data provider (2024-01-31 10:16:01): [nss] [cache_req_validate_domain_enumeration] (0x0400): [CID#3] CR #2: Domain mydomain.com does not support enumeration, skipping... (2024-01-31 10:16:01): [nss] [cache_req_validate_domain_enumeration] (0x0400): [CID#3] CR #2: Enumeration requested but not enabled (2024-01-31 10:16:01): [nss] [cache_req_global_ncache_add] (0x2000): [CID#3] CR #2: This request type does not support global negative cache (2024-01-31 10:16:01): [nss] [cache_req_process_result] (0x0400): [CID#3] CR #2: Finished: Not found (2024-01-31 10:16:01): [nss] [sss_nss_protocol_done] (0x4000): [CID#3] Sending reply: success (2024-01-31 10:16:01): [nss] [cache_req_set_plugin] (0x2000): [CID#3] CR #3: Setting "Enumerate users" plugin (2024-01-31 10:16:01): [nss] [cache_req_send] (0x0400): [CID#3] CR #3: REQ_TRACE: New request [CID #3] 'Enumerate users' (2024-01-31 10:16:01): [nss] [cache_req_select_domains] (0x0400): [CID#3] CR #3: Performing a multi-domain search (2024-01-31 10:16:01): [nss] [cache_req_search_domains] (0x0400): [CID#3] CR #3: Search will bypass the cache and check the data provider (2024-01-31 10:16:01): [nss] [cache_req_validate_domain_enumeration] (0x0400): [CID#3] CR #3: Domain mydomain.com does not support enumeration, skipping... (2024-01-31 10:16:01): [nss] [cache_req_global_ncache_add] (0x2000): [CID#3] CR #3: This request type does not support global negative cache (2024-01-31 10:16:01): [nss] [cache_req_process_result] (0x0400): [CID#3] CR #3: Finished: Not found (2024-01-31 10:16:01): [nss] [sss_nss_protocol_done] (0x4000): [CID#3] Sending reply: not found (2024-01-31 10:16:01): [nss] [sss_nss_endent] (0x0100): [CID#3] Resetting enumeration state (2024-01-31 10:16:01): [nss] [sss_nss_protocol_done] (0x4000): [CID#3] Sending reply: success (2024-01-31 10:16:02): [nss] [get_client_cred] (0x4000): Client [0x562038cec2c0][24] creds: euid[34188] egid[34188] pid[1884642] cmd_line['/usr/sbin/opendkim']. (2024-01-31 10:16:02): [nss] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x562038cec2c0][24]

@kivengh
Copy link

kivengh commented Aug 28, 2024

Hi,
I have similar issue on Amazon Linux 2023 and SSSD 2.9.1 abd 2.9.4. I've got to downgrade to SSSD 2.5.0 to get it works fine.

@tonyswu
Copy link

tonyswu commented Nov 15, 2024

We are considering an OS upgrade to Rocky9, wondering if this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants