Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter unavailable during socket timeouts #17

Open
frenkye opened this issue Oct 20, 2020 · 5 comments
Open

Exporter unavailable during socket timeouts #17

frenkye opened this issue Oct 20, 2020 · 5 comments

Comments

@frenkye
Copy link

frenkye commented Oct 20, 2020

Hi,

we are using this exporter for a while and I notice patterns, where exporter start logging errors (lines before end), then exporter hang as well. Is there any debug option how could I can track this more?

I would expect behavior that exporter should be responsive all the time, but report php_fpm_up{socket_path="..."} 0. Now we loosing all data when single socket is down.

Using exporter:
phpfpm_exporter, version 0.5.0 (branch: HEAD, revision: 9cb855b)

Started under supervisor:

command=/opt/prometheus/bin/php-fpm_exporter --web.listen-address=":XXXX"
        --phpfpm.status-path=/fpm-status
        --phpfpm.socket-directories=/var/lib/php/7.X/fpm/
# we use directories because on some machines we have 200+ sockets for different web sites

Exporter error log (Prague so UTC+2):

2020/10/20 10:14:24 Failed to scrape socket: dial unix /var/lib/php/7.2/fpm/sock.sock: connect: resource temporarily unavailable
...
2020/10/20 10:16:05 Failed to scrape socket: dial unix /var/lib/php/7.2/fpm/sock.sock: connect: resource temporarily unavailable

Screens from grafana dashboard (Prague so UTC+2):
Screenshot 2020-10-20 at 12 50 22
Screenshot 2020-10-20 at 12 52 18

Prometheus graf on up metric for job (UTC):
Screenshot 2020-10-20 at 13 03 54

@Lusitaniae
Copy link
Owner

thanks for submitting an Issue @frenkye

Not aware of any flag you could use to further debug, but I'm not actively developing Go or prometheus exporters.

Since you mentioned you're monitoring a large number of hosts, perhaps you'd need tighter timeouts when opening connections?

https://github.com/tomasen/fcgi_client

func DialTimeout(network, address string, timeout time.Duration) (fcgi *FCGIClient, err error)

If you replace the Dial invocation with DialTimeout, and specify a shorter timeout (say 2/5 sec) would that help?

@frenkye
Copy link
Author

frenkye commented Oct 20, 2020

@Lusitaniae Thank you for the tip. I'll give it a look and try this change. 👍

@frenkye
Copy link
Author

frenkye commented Oct 21, 2020

@Lusitaniae I have set up test enviroment where i have dummy page with php sleep(10) via php-fpm with max_children = 1 for tracing requests.

Changed both Dials for timeout with timeout 2*time.Second, but it has no effect on exporter behavior. When I access page with sleep, then exporter is waiting exactly ~10s for my web request to finish to allow request from exporter.

That seemed wierd.

I did some checking via strace

REQUEST_METHOD="GET" SCRIPT_NAME="/fpm-status" SCRIPT_FILENAME="/fpm-status" QUERY_STRING="full" strace cgi-fcgi -bind -connect /path/to/sock

On my test, during sleep request:

....
socket(AF_UNIX, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/lib/php/7.2/fpm/sock.sock"}, 35) = 0
write(3, "\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0", 16) = 16
write(3, "\1\4\0\1\t\\\4\0\f\4QUERY_STRINGfull\v\vSCRI"..., 2416) = 2416
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
select(4, [3], [3], NULL, NULL)         = 1 (out [3])
write(3, "\1\5\0\1\0\0\0\0", 8)         = 8
select(4, [3], [], NULL, NULL^Cstrace: Process 8120 detached
 <detached ...>

On server which had target down and fpm get overfilled by requests with full backlog:

...
socket(AF_UNIX, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/lib/php/7.2/fpm/sock.sock"}, 47^Cstrace: Process 844 detached
 <detached ...>

It seems its not problem in connection to socket, but in wating for the query on socket to finish. Because backlog allow this connection before is overflooded and start rejecting connections.

Any idea how to limit the query on socket for like 5s and not the connection?

@Lusitaniae
Copy link
Owner

Good progress so far.

I had another look at the fcgi client and the net interface https://golang.org/pkg/net/#UnixConn.SetDeadline

It offers some setDeadline methods for establishing TCP/UDP/socket connections.

Perharps the fcgi_client needs to implement those, which it doesn't at the moment.

@frenkye
Copy link
Author

frenkye commented Oct 26, 2020

I had conversation with our dev team and they will have loot at this in few days. If we find solution, they will make PR.

kanocz added a commit to kanocz/phpfpm_exporter that referenced this issue Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants