Skip to content

Some interactive means of monitoring of dispatched jobs #19

@mikucionisaau

Description

@mikucionisaau

It would be nice to have some means of monitoring the resources used by the dispatched jobs.
For example, I use following script to dispatch UPPAAL-specific (single-threaded with lots of memory) jobs:

#!/usr/bin/env bash
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --sockets-per-node 1
#SBATCH --cores-per-socket 1
#SBATCH --mail-type=END
#SBATCH --mail-user=......
#SBATCH --error=slurm-%j.err
#SBATCH --output=slurm-%j.log
# # S B A T C H --time=24:00:00
# # S B A T C H --partition=rome
set -e

PERIOD=10
MEMAVAIL=100000

while getopts "hm:p:" option ; do
    case $option in
        h)
            echo "$0 script launches a command and monitors the resources."
            echo "$0 exists if process exits."
            echo "$0 kills the process if machine rans out of available memory."
            echo "Synopsis: $0 [-h] [-p N] [-m N] command with arguments"
            echo " -h     prints this help screen"
            echo " -p N   samples every N seconds"
            echo " -m N   kills if available memory gets below N kB"
            exit;;
        m)  MEMAVAIL="$OPTARG";;
        p)  PERIOD="$OPTARG";;
        ?)  echo "Invalid option $option, consult -h"
            exit 2;;
    esac
done

shift $(($OPTIND - 1))

COMMAND="$@"

if [ "$#" -lt 1 ]; then
    echo -e "Error: no arguments, expecting command and its arguments."
    echo -e "Usage:\n\t$0 your_command your_arguments"
    exit 1
fi

"$@" &
pid=$!
# echo "Process statistics is in process-$pid-stats.txt"
exec hogwatch -p$PERIOD -m$MEMAVAIL $pid

whereas the hogwatch is the script monitoring specific process and logging the resources:

#!/usr/bin/env bash
set -e

PERIOD=10
MEMAVAIL=100000

while getopts "hm:p:" option ; do
    case $option in
        h)
            echo "$0 script monitors processes by their PIDs and statistics into process-PID-stats.txt."
            echo "$0 exists if all monitored processes exit or machine rans out of available memory."
            echo "Synopsis: $0 [-h] [-p N] [-m N] PID*"
            echo " -h     prints this help screen"
            echo " -p N   samples every N seconds"
            echo " -m N   kills watched PIDs if available memory gets below N kB"
            exit;;
        m)  MEMAVAIL="$OPTARG";;
        p)  PERIOD="$OPTARG";;
        ?)  echo "Invalid option $option, consult -h"
            exit 2;;
    esac
done

shift $(($OPTIND - 1))

PIDS="$@"

function proc_status() {
    pid=$1
    f=process-$pid-stats.txt
    if [ ! -e $f ]; then
        # print the whole command line as the first line:
        ps -o pid,args -p $pid | tail -n1 > $f
        # print the table header:
        echo -ne "DATE       " >> $f
        ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | head -n1 >> $f
    fi
    # print date-timestamp:
    echo -ne "$(date +%s) " >> $f
    # print process resources:
    ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | tail -n1 >> $f
}

# monitor the free memory:
mem_avail=$(free | grep Mem | gawk '{ print $7 }')
while [ $mem_avail -gt $MEMAVAIL ] ; do
    list=""
    for pid in $PIDS ; do
        if [ -e "/proc/$pid" ]; then
            proc_status $pid
            list="$list $pid"
        fi
    done
    PIDS=$list
    if [ -z "$PIDS" ]; then
        exit 0
    fi
    sleep $PERIOD
    mem_free=$(free | grep Mem | gawk '{ print $4 }')
done

echo "hogwatch: machine is out of available memory, thus killing $PIDS"
kill -9 $PIDS

Then I have the following python script to show the memory and cpu consumption:

#!/usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
import csv
import sys
import time
import datetime
import os

date=[]
cpu=[]
memory=[]
cputime=[]
elapsed=[]
virtual=[]
working=[]
title="Memory"
columns="???"

for arg in sys.argv[1:]:
    date.clear()
    cpu.clear()
    memory.clear()
    cputime.clear()
    elapsed.clear()
    virtual.clear()
    working.clear()
    with open(arg, 'r') as csvfile:
        title=csvfile.readline()
        columns=csvfile.readline()
        rows = csv.reader(csvfile, delimiter=' ', skipinitialspace=True)
        for row in rows:
            if len(row) == 7:
                date.append(datetime.datetime.utcfromtimestamp(int(row[0])))
                cpu.append(float(row[1]))
                memory.append(float(row[2]))
                cputime.append(row[3])
                ela = time.strptime(row[4], "%M:%S" if row[4].count(':')==1 else "%H:%M:%S");
                delta = datetime.timedelta(hours=ela.tm_hour,minutes=ela.tm_min,seconds=ela.tm_sec).total_seconds()
                elapsed.append(delta)
                virtual.append(float(row[5])/1024/1024)
                working.append(float(row[6])/1024/1024)
    fig, ax = plt.subplots(2)
    x = elapsed
    #x = date
    ax[0].plot(x, virtual, color='b', label="virtual")
    ax[0].plot(x, working, color='r', label="working")
    ax[0].set(xlabel='time (s)', ylabel='memory (GB)', title=title)
    ax[0].grid()
    ax[0].legend()
    ax[1].plot(x, cpu, color='r', label="CPU")
    ax[1].plot(x, memory, color='b', label="memory")
    ax[1].set(xlabel='time (s)', ylabel='resources (%)')
    ax[1].grid()
    ax[1].legend()

    #plt.get_current_fig_manager().canvas.manager.set_window_title(arg)
    #plt.show()
    plt.gcf().set_size_inches(15, 15)
    out = arg + ".png"
    fig.savefig(out, bbox_inches='tight')
    print("Plot saved to " + out)
    os.system("display " + out)

Currently Python cannot open a window with interactive zoom widgets, because windowing toolkit libraries (such as tk, qt, gtk etc) are not installed (and are not available in virtual python environments), so the script dumps a png image and then launches display to show it.

Example result:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions