Skip to content

desertwitch/sesmon

Repository files navigation

Logo

sesmon

SCSI Enclosure Services (SES)
Monitoring and Alerting Daemon

Release Go Version Go Reference Go Report License
Codecov Lint Tests Build

Overview

sesmon is a monitoring and alerting daemon for SES-capable SCSI enclosures. It periodically polls sg_ses for a JSON-format --all dump, comparing changes between previous and most recent device data. An alert is raised if changes of status-relevant fields were detected. For each SES element of the SCSI enclosure, specifically only the status descriptors and their following fields (most are common among all element status descriptors) are monitored:

  • status
  • prdfail
  • disabled
  • swap
  • temperature (if present)
  • voltage (if present)
  • current (if present)

The alerts themselves are emitted to standard error (stderr), and it is also possible to configure an external notification agent for each device. Such an agent could be a shell script or any other executable, which is then called on alert, with the relevant information passed via positional arguments (as text and JSON).

Installation

To build from source, a Makefile is included with the project's source code. Running make all will compile the application and pull in any necessary dependencies. make check runs the test suite and static analysis tools.

For convenience, precompiled static binaries for common architectures are released through GitHub. These can be installed into /usr/bin/ or respective system locations; ensure they are executable by running chmod +x before use.

All builds from source are designed to generate reproducible builds, meaning that they should compile as byte-identical to the respective released binaries and also have the exact same checksums upon integrity verification.

Dependencies

  • sg_ses (as usually a part of sg3_utils packages)

Building from source

git clone https://github.com/desertwitch/sesmon.git
cd sesmon
make all

Running a built executable

./sesmon --help

Configuration

# sesmon configuration file
# "check" and "test" commands can help verify configuration files

# Disable timestamps in log output
disable_timestamps: false

# List of devices to monitor
#
# Devices can be defined either by device path or SAS address (or both)
# Defining by SAS address is more stable across reboots (and recommended)
# SAS addresses can be obtained by e.g. using the "lsscsi" utility ("-t")
#
# If defined by SAS address, the devices are resolved to their "/dev" paths
# at the begin of the program (can be tested with "sesmon test <config.yaml>")
# SAS address resolves using: "/sys/class/scsi_generic/sg*/device/sas_address"
devices:
  # Device 1 - resolve by SAS address (recommended)
  - address: "0x500a098012345678"

    # Type of device (0 = Device, 1 = JSON file)
    # JSON file "devices" can be useful for testing
    type: 0
    
    # Human-readable description of this device
    description: "JBOD"
    
    # Enable monitoring for this device
    enabled: true
    
    # Optional: Device monitoring configuration
    # Omitted settings use defaults as shown below
    config:
      # How often to poll the target device for data
      poll_interval: "1m30s"
      
      # How often to attempt a device poll (must be > 0)
      poll_attempts: 3
      
      # How long a device poll attempt can take (multiplies with attempts)
      poll_attempt_timeout: "15s"
      
      # How long to wait between device poll attempts (in case of failure)
      poll_attempt_interval: "15s"
      
      # How many consecutive poll failures trigger back-off period
      # Note: First failure = after 3 attempts (set value of poll_attempts)
      #       So backoff after 3 failures = after total 9 failed poll attempts
      poll_backoff_after: 3
      
      # How long to pause polling the device when in back-off period
      poll_backoff_time: "3m0s"
      
      # Dispatch notification through agent when entering back-off period
      # Applies only if a notification agent is configured for the device
      poll_backoff_notify: true
      
      # Permanently stop monitoring the device when entering back-off period
      # If false, monitoring resumes normally after poll_backoff_time elapses
      poll_backoff_stopmonitor: false
      
      # Folder to write JSON files of device state and alerts to
      # Must be unique per device and creates the following files:
      #   - current.json (raw snapshot of current device state)
      #   - current_parsed.json (parsed snapshot of current device state)
      #   - change-YYYYMMDD-HHMMSS.json (single timestamped change report)
      #   - change-YYYYMMDD-HHMMSS.json (single timestamped change report)
      #   - ...
      # Default: (none)
      output_dir: "/var/lib/sesmon/JBOD"
      
      # Output also verbose operational information as part of log output
      verbose: false
    
    # Optional: Notification agent (e.g., external script for alerts)
    # If omitted, alerts are emitted only as part of regular log output
    script_notifier:
      # Path to executable notification script
      # Script receives these arguments:
      #   $1: Device path (e.g., /dev/sg25)
      #   $2: SAS address (e.g., 0x500a098012345678)
      #   $3: Device description (e.g., "JBOD")
      #   $4: Notification message in textual format
      #   $5: Change report in JSON format (where applicable)
      script: "/usr/local/bin/my-notify-script.sh"
      
      # Optional: Notification agent configuration
      # Omitted settings use defaults as shown below
      config:
        # How often to attempt a notification (must be > 0)
        notify_attempts: 3
        
        # How long a notification attempt can take (multiplies with attempts)
        notify_attempt_timeout: "15s"
        
        # How long to wait between notification attempts (in case of failure)
        notify_attempt_interval: "15s"
  
  # Device 2 - resolve by device path (not recommended)
  - device: "/dev/sg25"
    type: 0
    description: "JBOD2"
    enabled: true
    # Uses all default settings and no notification agent

  # Device 3 - JSON file for simulations and/or testing
  - device: "/tmp/device.json"
    type: 1
    description: "JBOD3"
    enabled: true
    # Uses all default settings and no notification agent

License

All code is licensed under the MIT License.

About

sesmon is a monitoring and alerting daemon for SES-capable SCSI enclosures.

Resources

License

Stars

Watchers

Forks

Packages

No packages published