- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15.8k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.1
What happened?
When using CloudWatch logging there seems to be a 60 second time delay between the logging output updating. Please see the video below and observe:
- Initially Airflow can't find the remote logs (as there are none).
- Airflow detects local logs.
- Nothing happens in the UI for 60 seconds.
- After 60 seconds logging appears, and the top of the printout states it is from CloudWatch logs.
- It then periodically updates the logs every 60 seconds after until the task is completed.
Please skip ahead in the video as most of it is static!
Recording.2025-01-10.134419.mp4
As a second minor point, you can also see that grouping now longer works with the logs read from CloudWatch. However, this doesn't concern me as much.
What you think should happen instead?
I'm not sure if I have configured something incorrectly, but I expected the same behaviour as local logging, i.e. tailing of the log file.
I struggled to find the default behaviour documented, but what I expect to happen was that Airflow would use the local logs if they were available and only use the remote logs if no local logs were found.
I found this logic in previous documentation (<2.0), although this may now be outdated:
In the Airflow Web UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs can not be found or accessed, local logs will be displayed. Note that logs are only sent to remote storage once a task is complete (including failure); In other words, remote logs for running tasks are unavailable (but local logs are available).
Can you confirm that this is the expected behaviour and what is in the video above is a bug?
Alternatively, I can see in the browser that a request is made every second to update the logs, and I can confirm that the logs are only being written to CloudWatch every 60 seconds or when the task is complete.
Is this expected behaviour? or should logs be written to cloudwatch at a higher rate?
If the behaviour in the video is actually what is expected, I would like to suggest one of the following options as we need a <60 second refresh window in our logging setup:
- A configuration variable to use local logging first if available (e.g. local_logging_prefer = True).
- A configuration variable for the update frequency of the logging when using remote logging (e.g. remote_logging_refresh_period = 60).
Let me know your thoughts.
How to reproduce
The remote logging config was copied from here:
[logging]
# Airflow can store logs remotely in AWS Cloudwatch. Users must supply a log group
# ARN (starting with 'cloudwatch://...') and an Airflow connection
# id that provides write and read access to the log location.
remote_logging = True
remote_base_log_folder = cloudwatch://arn:aws:logs:<region name>:<account id>:log-group:<group name>
remote_log_conn_id = MyCloudwatchConn
The demo DAG used in the video above prints to logging every 10 seconds and is as follows:
import logging
from datetime import datetime
from time import sleep
from airflow.models import DAG
from airflow.operators.python import task
with DAG(
    dag_id="dev__cloudwatch_logging_testing",
    start_date=datetime(2024, 1, 1),
    schedule=None,
):
    @task
    def task1():
        sleeptime = 10
        for i in range(0, 300, sleeptime):
            logging.info(i)
            sleep(sleeptime)
    task1()Operating System
NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
Versions of Apache Airflow Providers
apache-airflow==2.10.1
apache-airflow-providers-amazon==8.28.0
apache-airflow-providers-celery==3.8.1
apache-airflow-providers-cncf-kubernetes==8.4.1
apache-airflow-providers-common-compat==1.2.0
apache-airflow-providers-common-io==1.4.0
apache-airflow-providers-common-sql==1.16.0
apache-airflow-providers-docker==3.13.0
apache-airflow-providers-elasticsearch==5.5.0
apache-airflow-providers-fab==1.3.0
apache-airflow-providers-ftp==3.11.0
apache-airflow-providers-google==10.22.0
apache-airflow-providers-grpc==3.6.0
apache-airflow-providers-hashicorp==3.8.0
apache-airflow-providers-http==4.13.0
apache-airflow-providers-imap==3.7.0
apache-airflow-providers-microsoft-azure==10.4.0
apache-airflow-providers-mysql==5.7.0
apache-airflow-providers-odbc==4.7.0
apache-airflow-providers-openlineage==1.11.0
apache-airflow-providers-postgres==5.12.0
apache-airflow-providers-redis==3.8.0
apache-airflow-providers-sendgrid==3.6.0
apache-airflow-providers-sftp==4.11.0
apache-airflow-providers-slack==8.9.0
apache-airflow-providers-smtp==1.8.0
apache-airflow-providers-snowflake==5.7.0
apache-airflow-providers-sqlite==3.9.0
apache-airflow-providers-ssh==3.13.1
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct