-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Summary
There is an issue with the logging/parsing strategy of nginx access logs.
Details
Since the conversion of CanarieAPI with separate cron/nginx/api definitions, the cron command that was invoked (https://github.com/Ouranosinc/CanarieAPI/blob/39c32e7ee2b21b5a3a6ba671ab41aa6195fde5b7/Dockerfile#L39) that led to some log rotation to occur (since logrotate config is available by default, but not cron or logrotate utilities, see below), is not being done anymore (current proxy only calls nginx -g daemon off;).
> docker exec -ti proxy bash
> service --status-all
[ ? ] hwclock.sh
[ + ] nginx
[ + ] nginx-debug
> cat /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 640 nginx adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}This leads to, eventually, very large logs to be collected.
3.9G Oct 2 06:26 access_file.logThis was not "that much a big deal" before.
However, since the fix of the log parser (Ouranosinc/CanarieAPI#53) and its integration (#588), the operation ran by CanarieAPI actually runs rather than preemptively failing. With this big file, it then proceeds to parse the log, which can take ~4min to parse based on the above size:
[2025-10-02 06:24:44,766] [13] [INFO] canarieapi.app_object : Loading log file : /var/logs/access_file.log
[2025-10-02 06:28:43,812] [13] [INFO] canarieapi.app_object : Compiling stats from 27483724 records
Provided that this command is launched every minute (https://github.com/Ouranosinc/CanarieAPI/blob/master/docker/canarieapi-cron), this leads to an ever-growing amount of duplicate tasks in parallel.
I have received memory monitoring alarms from our server, and found that dozens of cron jobs were hogging the memory and bringing the server to a halt.
Potential Strategies to Fix
Note
I can think of a few strategies to fix the problem.
Maybe others are possible, suggestions/concerns welcomed.
- use a docker with
nginx+logrotatepreinstalled and actually run it- mostly drop-in replacement, but one must be defined
- maybe it could be considered to provide template/vars to override the default
logrotateconfig provided bynginx
- discard the
log-filemethod (https://github.com/search?q=repo%3Abird-house%2Fbirdhouse-deploy+%28PROXY_LOG_FILE+OR+proxy-logs%29&type=code) and usestdout(default of nginx) to leverage docker's log rotation- a separate config than our current
x-logging: &default-loggingin composes would be nedded to have a dedicated log - prometheus and canarieapi would need remapping to that resulting file
- a separate config than our current
- define a cron on the host
- since the file is mapped to the host, the logrotate config could be on the server directly, making it transparent to the stack
- however, that is not very portable, and will most likely break on new servers where we will forget to set it up
To Reproduce
Steps to reproduce the behavior:
- Let the server run for a while with nginx + canarieapi.
- Trigger a lot of requests to force the creation of many access logs.
- Observe the logs reaching a big size.
- Inspect running commands on the host (eg:
ps faux), there should be a lot ofcroncommands withcanarie.logparsercalls.