error "No more monitors are runnable!" when ping fails but within threshhold #1174

jmcclelland · 2023-03-13T18:00:21Z

I seem to get the errror "No more monitors are runnable!" when a ping fails but is within the given threshhold.

Here's the sample output from journald when running simplemonitor via systemd:

Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25  WARNING (simplemonitor) monitor failed but within tolerance: network (Command '['ping', '-c1', '-W5', '8.8.8.8']' returned non-zero exit status 1.)
Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25    ERROR (simplemonitor) No more monitors are runnable!

Is this expected? If so I can work-around it. But - I'm not sure I fully understand what it means (even after looking in the code).

Thanks for any suggestions.

The text was updated successfully, but these errors were encountered:

jamesoff · 2023-03-16T17:07:19Z

(Apologies for the delayed response)

That error is from the code which tries to run monitors as efficiently as possible, which (broadly) loops over all the monitors, running any which don't have dependencies, and postponing to the next run any which do. Then it will run all those with deps which passed, postponing those with outstanding deps, and repeat. The error means that it couldn't run all of them, which can be because of failed deps. Do you have a monitor which depends on the one which fails?

The interplay between monitors failing and thresholds is a little messy at times, mostly because of how the code grew over the years :)

If you can reproduce with --debug, it will output which monitors it's trying to work with each time round the loop.

It's probably safe to ignore and maybe shouldn't be logged as ERROR; do you have any monitors you expect to run which seem to be not updating?

jmcclelland · 2023-03-16T21:09:19Z

Ah - that does explain it perfectly - thank you. The (still within threshhold) failure is on the ping monitor that checks to see if we are online, which all other ping monitors depend on. As far as I can all monitors are working as expected.

I only noticed because I am using a loose regular expression to watch the output of our logs for the word "ERROR" and escalate those, so this one popped up.

I think changing it to a WARNING is a good idea since we do expect it to happen and that doesn't mean we have any errors.

And, by the way, thanks for this project! I've been using it now for a few years and it's perfect for our needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error "No more monitors are runnable!" when ping fails but within threshhold #1174

error "No more monitors are runnable!" when ping fails but within threshhold #1174

jmcclelland commented Mar 13, 2023

jamesoff commented Mar 16, 2023

jmcclelland commented Mar 16, 2023

error "No more monitors are runnable!" when ping fails but within threshhold #1174

error "No more monitors are runnable!" when ping fails but within threshhold #1174

Comments

jmcclelland commented Mar 13, 2023

jamesoff commented Mar 16, 2023

jmcclelland commented Mar 16, 2023