Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check_MK: localhost (lae) #91

Open
kayiwa opened this issue Sep 5, 2024 · 3 comments
Open

Check_MK: localhost (lae) #91

kayiwa opened this issue Sep 5, 2024 · 3 comments
Assignees

Comments

@kayiwa
Copy link
Member

kayiwa commented Sep 5, 2024

Service PROBLEM notification
Host: localhost (IP: 127.0.0.1)
Service: HTTPS lae
State: CRITICAL
Additional Info
CRITICAL - Socket timeout after 10 seconds

We have an alert that is almost certainly misconfigured to check from content on an endpoint.

@kayiwa kayiwa self-assigned this Sep 5, 2024
@acozine
Copy link
Contributor

acozine commented Sep 5, 2024

The check is currently green, although it does regularly go into Critical/red mode overnight. It's checking lae.princeton.edu for the text Digital Archive of Latin America and Caribbean Ephemera, which is definitely present on the site home page. Maybe the machine reboots? Or the site is unresponsive for some reason?

@acozine
Copy link
Contributor

acozine commented Sep 5, 2024

I checked uptime on lae-prod1 and lae-prod2, both have been up for 8 days, so the problem is not that the servers are rebooting overnight.

@acozine
Copy link
Contributor

acozine commented Sep 30, 2024

We saw multiple alerts and recoveries on this check over the weekend. Both VMs have plenty of space. In the rails logs I see a couple of entries like this:

W, [2024-09-30T00:28:36.394569 #142462]  WARN -- honeybadger: ** [Honeybadger] Error report failed: an unknown error occurred. code=error error="HTTP Error: Net::OpenT
imeout" level=2 pid=142462

and a lot of entries like this:

E, [2024-09-30T00:28:38.832051 #142483] ERROR -- : [dd.env=production dd.service=dpul dd.trace_id=85276041913562946 dd.span_id=1297836789973117643 ddsource=ruby] Health check failed with: execution expired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants