Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Airflow Monitoring #12072

Closed
wants to merge 0 commits into from
Closed

Conversation

songzhendong
Copy link
Contributor

@songzhendong songzhendong commented Apr 4, 2024

Add Apache Airflow Monitoring

  • Update the documentation to include this new feature.
    - Documentation has been updated in the docs directory.
  • Tests(including UT, IT, E2E) are added to verify the new feature.
    - Unit tests and integration tests have been added under test directory.
  • UI the screenshots below.
    1

xf
xs
2

@wu-sheng wu-sheng requested a review from wankai123 April 4, 2024 10:38
@wu-sheng wu-sheng added this to the 10.0.0 milestone Apr 4, 2024
@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Apr 4, 2024
@wu-sheng
Copy link
Member

wu-sheng commented Apr 6, 2024

With #12076 gets merged, you should have the expected menu.

Then, you should fix all CI tasks. UI submodule should be reverted.

@wu-sheng
Copy link
Member

wu-sheng commented Apr 6, 2024

SWIP and relative updates are missing.

@wu-sheng
Copy link
Member

wu-sheng commented Apr 6, 2024

image

You still don't fix the conflicts, CI would not run until you fix it.

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc and code styles usually have history reason, only few are typos.
You should not randomly change them so randomly.

And if there's typo, it should not be fixed in a feature PR, it will be very confused when review git changelogs in the future.

Don't hurry in changing.

@@ -130,5 +135,5 @@
* Remove `OpenTelemetry Exporter` support from meter doc, as this has been flagged as unmaintained on OTEL upstream.
* Add doc of one-line quick start script for different storage types.
* Add FAQ for `Why is Clickhouse or Loki or xxx not supported as a storage option?`.

* Add Airflow monitoring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Add Airflow monitoring.
* Add Airflow monitoring docs.

@wu-sheng
Copy link
Member

wu-sheng commented Apr 6, 2024

And, you still don't resolve conflicts. Learn git more please.

@wu-sheng
Copy link
Member

wu-sheng commented Apr 7, 2024

Don't close a PR, as you only update things.

@wu-sheng wu-sheng reopened this Apr 7, 2024
Comment on lines 109 to 111
* Update tabs of the Kubernetes service page.
* Support Airflow monitoring.
* Update tabs of the Kubernetes service page.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Update tabs of the Kubernetes service page.
* Support Airflow monitoring.
* Update tabs of the Kubernetes service page.
* Update tabs of the Kubernetes service page.
* Support Airflow monitoring.

Duplicated

@wu-sheng
Copy link
Member

wu-sheng commented Apr 7, 2024

Please recheck UI submodule, you are changing that unexpectedly.

Comment on lines 122 to 123
"aggregate_labels(meter_airflow_dag_processing_file_path_queue_size,sum))",
"aggregate_labels(meter_airflow_dag_processing_processes,sum))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"aggregate_labels(meter_airflow_dag_processing_file_path_queue_size,sum))",
"aggregate_labels(meter_airflow_dag_processing_processes,sum))"
"aggregate_labels(meter_airflow_dag_processing_file_path_queue_size,sum)",
"aggregate_labels(meter_airflow_dag_processing_processes,sum)"

- SW_STORAGE=h2
# OpenTelemetry collector 1
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
image: otel/opentelemetry-collector-contrib:0.96.0
image: otel/opentelemetry-collector:${OTEL_COLLECTOR_VERSION}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better use a unified version.

operations:
- action: add_label
new_label: service_instance_id # Add 'service_instance_id' label to metrics
new_value: "instance1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When airflow metrics use the push mode, do the metrics have any labels to identify which service and instance?
You add labels manually here, I'm not sure when 1 otel-collector facing 2 airflow instances, how to transform the metrics and add labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is impossible(if there is no value to have metadata in OTLP push), I think we should recommend using K8s sidecar as recommended deployment to add this label. In there, we could use pod name as the instance name, and k8s service name as the service name.

Meanwhile, is host_name representing the service name?

# This file is used to show how to write configuration files and can be used to test.

setup:
env: compose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure whether compose is a good way to run metadata clear deployment. If it isn't, airflow instance + otel collector as sidecar deployment on k8s may be a better way.

e2e is not just for testing, you could see on the docs(you wrote), all configurations here are a kind of guidance and reference for users to deploy this.

Comment on lines 55 to 58
operations:
- action: add_label
new_label: host_name # Add 'host_name' label indicating the host name
new_value: "airflow-webserver.airflow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the service name, and hard coded instance name, I doubt the service and instance hierarchy would work.
I saw your screenshot, but that seems not match your e2e.

@wu-sheng
Copy link
Member

wu-sheng commented Apr 9, 2024

FYI @songzhendong We will update the MAL implementation for #11992. Then UI side will update to support multiple-label tags as well.

You will need to update this PR accordingly, most about UI dashboard setup and e2e.

@songzhendong
Copy link
Contributor Author

FYI @songzhendong We will update the MAL implementation for #11992. Then UI side will update to support multiple-label tags as well.

You will need to update this PR accordingly, most about UI dashboard setup and e2e.

Get

@wu-sheng wu-sheng requested a review from wankai123 April 13, 2024 13:19
@wu-sheng wu-sheng removed this from the 10.0.0 milestone Apr 13, 2024
@wu-sheng wu-sheng added the no update The owner doesn't provide further feedback. label Apr 18, 2024
@wu-sheng
Copy link
Member

No update for a week. @songzhendong If you want to continue to make this ready to merge, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. feature New feature no update The owner doesn't provide further feedback.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants