-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
It would be nice to have some means of monitoring the resources used by the dispatched jobs.
For example, I use following script to dispatch UPPAAL-specific (single-threaded with lots of memory) jobs:
#!/usr/bin/env bash
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --sockets-per-node 1
#SBATCH --cores-per-socket 1
#SBATCH --mail-type=END
#SBATCH --mail-user=......
#SBATCH --error=slurm-%j.err
#SBATCH --output=slurm-%j.log
# # S B A T C H --time=24:00:00
# # S B A T C H --partition=rome
set -e
PERIOD=10
MEMAVAIL=100000
while getopts "hm:p:" option ; do
case $option in
h)
echo "$0 script launches a command and monitors the resources."
echo "$0 exists if process exits."
echo "$0 kills the process if machine rans out of available memory."
echo "Synopsis: $0 [-h] [-p N] [-m N] command with arguments"
echo " -h prints this help screen"
echo " -p N samples every N seconds"
echo " -m N kills if available memory gets below N kB"
exit;;
m) MEMAVAIL="$OPTARG";;
p) PERIOD="$OPTARG";;
?) echo "Invalid option $option, consult -h"
exit 2;;
esac
done
shift $(($OPTIND - 1))
COMMAND="$@"
if [ "$#" -lt 1 ]; then
echo -e "Error: no arguments, expecting command and its arguments."
echo -e "Usage:\n\t$0 your_command your_arguments"
exit 1
fi
"$@" &
pid=$!
# echo "Process statistics is in process-$pid-stats.txt"
exec hogwatch -p$PERIOD -m$MEMAVAIL $pidwhereas the hogwatch is the script monitoring specific process and logging the resources:
#!/usr/bin/env bash
set -e
PERIOD=10
MEMAVAIL=100000
while getopts "hm:p:" option ; do
case $option in
h)
echo "$0 script monitors processes by their PIDs and statistics into process-PID-stats.txt."
echo "$0 exists if all monitored processes exit or machine rans out of available memory."
echo "Synopsis: $0 [-h] [-p N] [-m N] PID*"
echo " -h prints this help screen"
echo " -p N samples every N seconds"
echo " -m N kills watched PIDs if available memory gets below N kB"
exit;;
m) MEMAVAIL="$OPTARG";;
p) PERIOD="$OPTARG";;
?) echo "Invalid option $option, consult -h"
exit 2;;
esac
done
shift $(($OPTIND - 1))
PIDS="$@"
function proc_status() {
pid=$1
f=process-$pid-stats.txt
if [ ! -e $f ]; then
# print the whole command line as the first line:
ps -o pid,args -p $pid | tail -n1 > $f
# print the table header:
echo -ne "DATE " >> $f
ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | head -n1 >> $f
fi
# print date-timestamp:
echo -ne "$(date +%s) " >> $f
# print process resources:
ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | tail -n1 >> $f
}
# monitor the free memory:
mem_avail=$(free | grep Mem | gawk '{ print $7 }')
while [ $mem_avail -gt $MEMAVAIL ] ; do
list=""
for pid in $PIDS ; do
if [ -e "/proc/$pid" ]; then
proc_status $pid
list="$list $pid"
fi
done
PIDS=$list
if [ -z "$PIDS" ]; then
exit 0
fi
sleep $PERIOD
mem_free=$(free | grep Mem | gawk '{ print $4 }')
done
echo "hogwatch: machine is out of available memory, thus killing $PIDS"
kill -9 $PIDSThen I have the following python script to show the memory and cpu consumption:
#!/usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
import csv
import sys
import time
import datetime
import os
date=[]
cpu=[]
memory=[]
cputime=[]
elapsed=[]
virtual=[]
working=[]
title="Memory"
columns="???"
for arg in sys.argv[1:]:
date.clear()
cpu.clear()
memory.clear()
cputime.clear()
elapsed.clear()
virtual.clear()
working.clear()
with open(arg, 'r') as csvfile:
title=csvfile.readline()
columns=csvfile.readline()
rows = csv.reader(csvfile, delimiter=' ', skipinitialspace=True)
for row in rows:
if len(row) == 7:
date.append(datetime.datetime.utcfromtimestamp(int(row[0])))
cpu.append(float(row[1]))
memory.append(float(row[2]))
cputime.append(row[3])
ela = time.strptime(row[4], "%M:%S" if row[4].count(':')==1 else "%H:%M:%S");
delta = datetime.timedelta(hours=ela.tm_hour,minutes=ela.tm_min,seconds=ela.tm_sec).total_seconds()
elapsed.append(delta)
virtual.append(float(row[5])/1024/1024)
working.append(float(row[6])/1024/1024)
fig, ax = plt.subplots(2)
x = elapsed
#x = date
ax[0].plot(x, virtual, color='b', label="virtual")
ax[0].plot(x, working, color='r', label="working")
ax[0].set(xlabel='time (s)', ylabel='memory (GB)', title=title)
ax[0].grid()
ax[0].legend()
ax[1].plot(x, cpu, color='r', label="CPU")
ax[1].plot(x, memory, color='b', label="memory")
ax[1].set(xlabel='time (s)', ylabel='resources (%)')
ax[1].grid()
ax[1].legend()
#plt.get_current_fig_manager().canvas.manager.set_window_title(arg)
#plt.show()
plt.gcf().set_size_inches(15, 15)
out = arg + ".png"
fig.savefig(out, bbox_inches='tight')
print("Plot saved to " + out)
os.system("display " + out)Currently Python cannot open a window with interactive zoom widgets, because windowing toolkit libraries (such as tk, qt, gtk etc) are not installed (and are not available in virtual python environments), so the script dumps a png image and then launches display to show it.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
