Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error "No more monitors are runnable!" when ping fails but within threshhold #1174

Open
jmcclelland opened this issue Mar 13, 2023 · 2 comments

Comments

@jmcclelland
Copy link
Contributor

I seem to get the errror "No more monitors are runnable!" when a ping fails but is within the given threshhold.

Here's the sample output from journald when running simplemonitor via systemd:

Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25  WARNING (simplemonitor) monitor failed but within tolerance: network (Command '['ping', '-c1', '-W5', '8.8.8.8']' returned non-zero exit status 1.)
Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25    ERROR (simplemonitor) No more monitors are runnable!

Is this expected? If so I can work-around it. But - I'm not sure I fully understand what it means (even after looking in the code).

Thanks for any suggestions.

@jamesoff
Copy link
Owner

(Apologies for the delayed response)

That error is from the code which tries to run monitors as efficiently as possible, which (broadly) loops over all the monitors, running any which don't have dependencies, and postponing to the next run any which do. Then it will run all those with deps which passed, postponing those with outstanding deps, and repeat. The error means that it couldn't run all of them, which can be because of failed deps. Do you have a monitor which depends on the one which fails?

The interplay between monitors failing and thresholds is a little messy at times, mostly because of how the code grew over the years :)

If you can reproduce with --debug, it will output which monitors it's trying to work with each time round the loop.

It's probably safe to ignore and maybe shouldn't be logged as ERROR; do you have any monitors you expect to run which seem to be not updating?

@jmcclelland
Copy link
Contributor Author

Ah - that does explain it perfectly - thank you. The (still within threshhold) failure is on the ping monitor that checks to see if we are online, which all other ping monitors depend on. As far as I can all monitors are working as expected.

I only noticed because I am using a loose regular expression to watch the output of our logs for the word "ERROR" and escalate those, so this one popped up.

I think changing it to a WARNING is a good idea since we do expect it to happen and that doesn't mean we have any errors.

And, by the way, thanks for this project! I've been using it now for a few years and it's perfect for our needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants