Skip to content

Commit ca6a64a

Browse files
feat: Add a script to monitor the judges
1 parent b7e13b3 commit ca6a64a

File tree

2 files changed

+102
-0
lines changed

2 files changed

+102
-0
lines changed

scripts/README.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Scripts
2+
3+
## monitor_judges
4+
5+
Despite our efforts to stop and restart properly the judges after each execution request, in some cases the judges stay alive but as zombies.
6+
7+
In this weird state, the judge behaves accordingly to one of the following:
8+
- The judge seems to be idle, but is still connected to the message queue. Thus it still receives execution requests but does not handle them. Then all execution requests send to the judge result in timeouts.
9+
- The judge seems to be still running the student's code despite the scheduled interruption (10s of execution). Thus it is still consuming ressources uselessly.
10+
11+
This goal of this script is to monitor started containers and to restart containers identified as zombies.
12+
A judge is identified as a zombie if:
13+
- It is alive for longer than 30s.
14+
- It is not waiting for an execution request (thus it is processing one).
15+
- It is still processing the same execution request 10s later (the PID of the java command is still the same).
16+
17+
### Usage
18+
19+
This script is compatible with Python >= 2.7
20+
21+
First, you may need to install the dependency of this script:
22+
23+
```
24+
pip install python-daemon
25+
```
26+
27+
Then run the script with:
28+
29+
```
30+
python monitor_judges.py
31+
```

scripts/monitor_judges.py

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import daemon
2+
import subprocess
3+
from datetime import date, datetime, time, timedelta
4+
from threading import Timer
5+
6+
LOG_WAITING = "[INFO] Waiting for request...\n"
7+
MARKED_JUDGES = dict()
8+
9+
def mark_judge(judge_name, pid):
10+
print "Marking judge: {}".format(judge_name)
11+
MARKED_JUDGES[judge_name] = pid
12+
13+
def unmark_judge(judge_name):
14+
print "Unmarking judge: {}".format(judge_name)
15+
del MARKED_JUDGES[judge_name]
16+
17+
def is_waiting(judge_name):
18+
last_log = subprocess.check_output(["docker", "logs", "--tail=1", judge_name])
19+
return LOG_WAITING == last_log
20+
21+
def restart_judge(judge_name):
22+
print "Restarting judge: {}".format(judge_name)
23+
subprocess.check_output(["docker", "restart", judge_name])
24+
unmark_judge(judge_name)
25+
26+
def retrieve_start_time(judge_name):
27+
started_at = subprocess.check_output(["docker", "inspect", "--format", "{{.State.StartedAt}}", judge_name])
28+
started_at = started_at.split(".")[0]
29+
30+
started_date = started_at.split("T")[0].split("-")
31+
started_time = started_at.split("T")[1].split(":")
32+
33+
year = int(started_date[0])
34+
month = int(started_date[1])
35+
day = int(started_date[2])
36+
37+
started_date = date(year, month, day)
38+
39+
hour = int(started_time[0]) + 2 # To use the set timezone as now()
40+
minute = int(started_time[1])
41+
second = int(started_time[2])
42+
43+
started_time = time(hour, minute, second)
44+
45+
return datetime.combine(started_date, started_time)
46+
47+
def monitor_judge(judge_name):
48+
started_at = retrieve_start_time(judge_name)
49+
now = datetime.now()
50+
if (now - started_at) > timedelta(seconds=30):
51+
if not is_waiting(judge_name):
52+
pid = int(subprocess.check_output(["docker top {} | tail -n 1".format(judge_name)], shell=True).split()[1])
53+
if judge_name in MARKED_JUDGES and pid == MARKED_JUDGES[judge_name]:
54+
restart_judge(judge_name)
55+
else:
56+
mark_judge(judge_name, pid)
57+
else:
58+
if judge_name in MARKED_JUDGES:
59+
unmark_judge(judge_name)
60+
61+
def main():
62+
print "Marked judges: {}".format(MARKED_JUDGES)
63+
nb_judges = int(subprocess.check_output(["docker ps | grep judge | wc -l"], shell=True))
64+
for i in range(1, nb_judges + 1):
65+
judge_name = "judge_judge_{}".format(i)
66+
monitor_judge(judge_name)
67+
thread = Timer(10.0, main)
68+
thread.start()
69+
70+
with daemon.DaemonContext():
71+
main()

0 commit comments

Comments
 (0)