Skip to content
Davide Madrisan edited this page Oct 9, 2015 · 17 revisions

Nagios Plugins for Linux

A suite of Nagios plugins for monitoring Linux servers and appliances

Nagios is an open source computer system monitoring, network monitoring and infrastructure monitoring software application. Nagios, originally created under the name NetSaint, was written and is currently maintained by Ethan Galstad along with a group of developers who are actively maintaining both the official and unofficial plugins.

The Nagios Plugins for Linux are intended to be run by NRPE, the Nagios Remote Plugin Executor, that "allows you to remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.)."

Available Nagios Plugins

LNX_CLOCK - returns the number of seconds elapsed between local time and Nagios time

[/etc/nrpe.d/check_clock]
command[check_clock]=/usr/lib/nagios/plugins/check_clock --refclock $ARG1$ -w 60 -c 120

where $ARG1$ is the number of seconds since the "Epoch"

(1970-01-01 00:00:00 UTC) -- $(date '+%s')

provided by the Nagios poller.

Usage note

This check is intended for alerting when the number of seconds elapsed between the Nagios poller and the monitored server exceeds a given threshold (60 seconds for the warning state, and 120 seconds for a critical notification, in the example above). The clock of the Nagios server needs, of course, to be synchronized to an NTP server.

This plugin returns the number of seconds elapsed between
the host local time and Nagios time.
Copyright (C) 2014 Davide Madrisan <[email protected]>

Usage:
  check_clock [-w COUNTER] [-c COUNTER] --refclock TIME

Options:
  -r, --refclock COUNTER  the clock reference (in seconds since the Epoch)
  -w, --warning COUNTER   warning threshold
  -c, --critical COUNTER  critical threshold
  -v, --verbose   show details for command-line debugging
                  (Nagios may truncate output)
  -h, --help      display this help and exit
  -V, --version   output version information and exit

Examples:
  check_clock -w 60 -c 120 --refclock $ARG1$

  # where $ARG1$ is the number of seconds since the Epoch: "$(date '+%s')"
  # provided by the Nagios poller
Example of output
clock OK - time delta 39s | clock_delta=39
Performance data

clock_delta


LNX_UPTIME - check how long the system has been running

[ /etc/nrpe.d/check_uptime ]
command[check_uptime]=/usr/lib/nagios/plugins/check_uptime
command[check_uptime_notify]=/usr/lib/nagios/plugins/check_uptime --critical 30:
Usage note

In the example above, a notification will be sent by Nagios when the uptime of the monitored server will be less than 30 minutes. This will catch, for instance, an unexpected reboot of a servers caused by a non-maskable interrupt (a signal of a non-recoverable hardware error).

A note on the implementation of "check_uptime" provided by nagios-plugins 2.0+

This new Nagios plugin is based on the POSIX function clock_gettime() associated with the clock monotonic option (CLOCK_MONOTONIC). According to the POSIX specifications "the value returned by clock_gettime() represents the amount of time (in seconds and nanoseconds) since an unspecified point in the past (for example, system start-time, or the Epoch)". The (recent) Linux kernels returns a value that is somehow related to the system start-time but can be different from the output of the command uptime (procps), or the first value of /proc/uptime.

$ /usr/bin/uptime
18:45:00 up  8:46,  7 users,  load average: 0.67, 1.79, 2.49

$ awk '{printf("%02d:%02d\n",($1/60/60%24),($1/60%60))}' /proc/uptime
08:46

$ ./clock_monotonic
4 hours 37 min

(On OpenBSD 5.0, the clock monotonic function returns the same value as uptime, which is confirming this behaviour is platform dependent).

The implementation followed by nagios-plugins-linux is compatible with uptime and /proc/uptime.

This plugin checks how long the system has been running.
Copyright (C) 2010,2012-2014 Davide Madrisan <[email protected]>

Usage:
  check_uptime [OPTION]

Options:
  -m, --clock-monotonic  use the monotonic clock for retrieving the time
  -w, --warning PERCENT   warning threshold
  -c, --critical PERCENT   critical threshold
  -h, --help      display this help and exit
  -V, --version   output version information and exit

Examples:
  check_uptime
  check_uptime --critical 15: --warning 30:
  check_uptime --clock-monotonic -c 15: -w 30:

See the Nagios Developer Guidelines for range format:
<https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>
Example of output
uptime OK: 23 hours 56 min | uptime=1436
Performance data

uptime (in minutes)

Clone this wiki locally