You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I seem to get the errror "No more monitors are runnable!" when a ping fails but is within the given threshhold.
Here's the sample output from journald when running simplemonitor via systemd:
Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25 WARNING (simplemonitor) monitor failed but within tolerance: network (Command '['ping', '-c1', '-W5', '8.8.8.8']' returned non-zero exit status 1.)
Mar 13 17:07:25 monitor001 python3[760525]: 2023-03-13 17:07:25 ERROR (simplemonitor) No more monitors are runnable!
Is this expected? If so I can work-around it. But - I'm not sure I fully understand what it means (even after looking in the code).
Thanks for any suggestions.
The text was updated successfully, but these errors were encountered:
That error is from the code which tries to run monitors as efficiently as possible, which (broadly) loops over all the monitors, running any which don't have dependencies, and postponing to the next run any which do. Then it will run all those with deps which passed, postponing those with outstanding deps, and repeat. The error means that it couldn't run all of them, which can be because of failed deps. Do you have a monitor which depends on the one which fails?
The interplay between monitors failing and thresholds is a little messy at times, mostly because of how the code grew over the years :)
If you can reproduce with --debug, it will output which monitors it's trying to work with each time round the loop.
It's probably safe to ignore and maybe shouldn't be logged as ERROR; do you have any monitors you expect to run which seem to be not updating?
Ah - that does explain it perfectly - thank you. The (still within threshhold) failure is on the ping monitor that checks to see if we are online, which all other ping monitors depend on. As far as I can all monitors are working as expected.
I only noticed because I am using a loose regular expression to watch the output of our logs for the word "ERROR" and escalate those, so this one popped up.
I think changing it to a WARNING is a good idea since we do expect it to happen and that doesn't mean we have any errors.
And, by the way, thanks for this project! I've been using it now for a few years and it's perfect for our needs.
I seem to get the errror "No more monitors are runnable!" when a ping fails but is within the given threshhold.
Here's the sample output from journald when running simplemonitor via systemd:
Is this expected? If so I can work-around it. But - I'm not sure I fully understand what it means (even after looking in the code).
Thanks for any suggestions.
The text was updated successfully, but these errors were encountered: