Skip to content

Integration tests: Green assessment due to false green DAG #654

@Maleware

Description

@Maleware

When you have a logging or smoke test running at current main it would turn tests green although the DAG has thrown an error.

Given:
on current main before #653
and running make run-dev for the airflow operator.

running ./scripts/run-tests --test logging_airflow-3.0.1_openshift-false_executor-celery --skip-release --skip-delete

will produce a cluster state where airflow can not connect to the worker and will throw an error in the activated DAG:

Log message source details: sources=["Could not read served logs: HTTPConnectionPool(host='airflow-worker-custom-log-config-0', port=8793): Max retries exceeded with url: /log/dag_id=example_trigger_target_dag/run_id=manual__2025-07-09T17:32:56.030514+00:00/task_id=run_this/attempt=1.log (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0xffff603eaf30>: Failed to resolve 'airflow-worker-custom-log-config-0' ([Errno -2] Name or service not known)\"))"]
::group::Log message source details: sources=["Could not read served logs: HTTPConnectionPool(host='airflow-worker-custom-log-config-[0](http://localhost:8080/dags/example_trigger_target_dag/runs/manual__2025-07-09T17:32:56.030514+00:00/tasks/run_this?try_number=1#0)', port=8793): Max retries exceeded with url: /log/dag_id=example_trigger_target_dag/run_id=manual__2025-07-09T17:32:56.030514+00:00/task_id=run_this/attempt=1.log (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0xffff603eaf30>: Failed to resolve 'airflow-worker-custom-log-config-0' ([Errno -2] Name or service not known)\"))"]

However we would find a state reported:


State | success  <----- Success despite error
-- | --
Run ID | manual__2025-07-09T17:32:56.030514+00:00Copy
Run Type | manual
Run Duration | 2.87s
Last Scheduling Decision | 2025-07-09, 19:32:59
Queued at | 2025-07-09, 19:32:56
Start Date | 2025-07-09, 19:32:56
End Date | 2025-07-09, 19:32:59
Data Interval Start | 2025-07-09, 19:32:44
Data Interval End | 2025-07-09, 19:32:44
Trigger Source | rest_api

Which then would lead to

    logger.go:42: 19:34:13 | logging_airflow-3.0.1_openshift-false_executor-celery | skipping kubernetes event logging
=== NAME  kuttl
    harness.go:403: run tests finished
    harness.go:510: cleaning up
    harness.go:567: removing temp folder: ""
--- PASS: kuttl (294.10s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/logging_airflow-3.0.1_openshift-false_executor-celery (294.07s)
PASS

a postiive result in the integration test. The same is true for at least the smoke test.

I consider this to be quite frightening and we should investigate further.

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

Development: In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions