-
Notifications
You must be signed in to change notification settings - Fork 202
Description
I try to debug a problem where my worker job is stuck for several minutes before it is run.
This only happens sporadic. Has someone an idea how to fix that?
To narrow down the problem I added 2 test job to ofelia.
I created a docker-compose.yml based on the docker compose example in the README.
In opposite to the one in the README it starts at every minute when the seconds are 0
version: "3"
services:
ofelia:
image: mcuadros/ofelia:latest
depends_on:
- nginx
command: daemon --docker
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
labels:
ofelia.job-local.my-test-job.schedule: "0 * * * * *"
ofelia.job-local.my-test-job.command: "date"
nginx:
image: nginx
labels:
ofelia.enabled: "true"
ofelia.job-exec.datecron.schedule: "0 * * * * *"
ofelia.job-exec.datecron.command: "uname -a"
When I run this as only ofelia job on a raspberry pi I can't see the problem. All jobs a finished in a couple of milliseconds.
$ docker logs ofelia-test-ofelia-1 | head
2023-10-24T09:13:19.353Z scheduler.go:34 ▶ NOTICE New job registered "datecron" - "uname -a" - "0 * * * * *"
2023-10-24T09:13:19.353Z scheduler.go:34 ▶ NOTICE New job registered "my-test-job" - "date" - "0 * * * * *"
2023-10-24T09:13:19.354Z scheduler.go:54 ▶ DEBUG Starting scheduler with 2 jobs
2023-10-24T09:14:00.013Z common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] Started - uname -a
2023-10-24T09:14:00.014Z common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] Started - date
2023-10-24T09:14:00.026Z common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] StdOut: Tue Oct 24 09:14:00 UTC 2023
2023-10-24T09:14:00.026Z common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] Finished in "11.981455ms", failed: false, skipped: false, error: none
2023-10-24T09:14:00.177Z common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] StdOut: Linux 2a8971447761 6.1.57-v8+ #1688 SMP PREEMPT Thu Oct 12 15:14:26 BST 2023 aarch64 GNU/Linux
2023-10-24T09:14:00.177Z common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] Finished in "164.080199ms", failed: false, skipped: false, error: none
2023-10-24T09:15:00.004Z common.go:125 ▶ NOTICE [Job "datecron" (13a5cf1e8a53)] Started - uname -a
We have 5430 finished jobs
$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | wc -l
5430
All of them are in the range of ms
$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | head
"11.981455ms",
"164.080199ms",
"5.26668ms",
"113.323193ms",
"5.984081ms",
"172.045716ms",
"5.973062ms",
"135.727973ms",
"11.564462ms",
"119.189199ms",
$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | grep -v ms
$
Now to the target system. I have 3 jobs running on a v-server. The 2 test jobs from before and 1 real work job.
The work job runs a python script every full 15 minutes (e.g. 8:00, 8:15, 8:30, 8:45, 9:00, ...).
$ docker logs ofelia_test-ofelia-1 | head
2023-10-23T08:09:58.02Z scheduler.go:34 ▶ NOTICE New job registered "smart_cistern_backend" - "/bin/sh -c \"python -m cProfile -o logs/cProfile/$(date --iso-8601=seconds|sed 's/:/./g')_profiling_all.prof src/main.py\"" - "0 */15 * * * *"
2023-10-23T08:09:58.02Z scheduler.go:34 ▶ NOTICE New job registered "datecron" - "uname -a" - "0 * * * * *"
2023-10-23T08:09:58.02Z scheduler.go:34 ▶ NOTICE New job registered "my-test-job" - "date" - "0 * * * * *"
2023-10-23T08:09:58.02Z scheduler.go:54 ▶ DEBUG Starting scheduler with 3 jobs
2023-10-23T08:10:00.003Z common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] Started - uname -a
2023-10-23T08:10:00.006Z common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] Started - date
2023-10-23T08:10:00.024Z common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] StdOut: Mon Oct 23 08:10:00 UTC 2023
2023-10-23T08:10:00.024Z common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] Finished in "17.84595ms", failed: false, skipped: false, error: none
2023-10-23T08:10:00.089Z common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-23T08:10:00.089Z common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] Finished in "86.147247ms", failed: false, skipped: false, error: none
my-test-job still looks good but datecron sometimes needs a couple of minutes to succeed?
Same problem with my worker job but I think it's easier to just show the 2 test jobs.
2023-10-24T23:15:00.003Z common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] Started - uname -a
2023-10-24T23:15:00.003Z common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] Started - date
2023-10-24T23:15:00.005Z common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] StdOut: Tue Oct 24 23:15:00 UTC 2023
2023-10-24T23:15:00.005Z common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] Finished in "1.322497ms", failed: false, skipped: false, error: none
2023-10-24T23:15:00.005Z common.go:125 ▶ NOTICE [Job "smart_cistern_backend" (133cc0a8505c)] Started - /bin/sh -c "python -m cProfile -o logs/cProfile/$(date --iso-8601=seconds|sed 's/:/./g')_profiling_all.prof
src/main.py"
2023-10-24T23:16:00.021Z common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] Started - uname -a
2023-10-24T23:16:00.03Z common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] Started - date
2023-10-24T23:16:00.033Z common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] StdOut: Tue Oct 24 23:16:00 UTC 2023
2023-10-24T23:16:00.033Z common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] Finished in "2.433805ms", failed: false, skipped: false, error: none
2023-10-24T23:17:00.003Z common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] Started - uname -a
2023-10-24T23:17:00.004Z common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] Started - date
2023-10-24T23:17:00.005Z common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] StdOut: Tue Oct 24 23:17:00 UTC 2023
2023-10-24T23:17:00.005Z common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] Finished in "1.005371ms", failed: false, skipped: false, error: none
2023-10-24T23:17:17.915Z common.go:125 ▶ NOTICE [Job "datecron" (4a241d2b3ab6)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:17.915Z common.go:125 ▶ NOTICE [Job "datecron" (4a241d2b3ab6)] Finished in "9m17.911256973s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (888c7db1f6c6)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (888c7db1f6c6)] Finished in "8m18.36881784s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (88fa518982de)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (88fa518982de)] Finished in "4m18.361351038s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (6460bd79aaec)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z common.go:125 ▶ NOTICE [Job "datecron" (6460bd79aaec)] Finished in "7m18.368275647s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (ac5fb5b8ef93)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (ac5fb5b8ef93)] Finished in "5m18.363476051s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (e26fa12bf686)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (e26fa12bf686)] Finished in "6m18.362506677s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] Finished in "1m18.353145657s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] Finished in "2m18.370844487s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (d120d13b0299)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z common.go:125 ▶ NOTICE [Job "datecron" (d120d13b0299)] Finished in "3m18.360959649s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.375Z common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.375Z common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] Finished in "18.37143932s", failed: false, skipped: false, error: none
$ docker logs ofelia_test-ofelia-1 | grep datecron | grep Finished | awk '{ print $11 }' | grep -v ms | grep -v µs | grep m | sort | tail
"6m7.816589961s",
"6m9.582320826s",
"7m18.135762657s",
"7m18.368275647s",
"7m4.584998566s",
"7m49.063024497s",
"7m7.084763573s",
"8m18.36881784s",
"8m4.585050287s",
"9m17.911256973s",