Metrics Improvement Project

### Description

From @ferruzzi:

> Currently when you add a new metric to the codebase, you must also manually update the docs page.  The docs page inevitably gets out of date and misses some details.  We want an automated system to generate the docs page based on the actual metrics.  There are also known instances where the same metric is being created and emitted in more than one place, causing duplicate data.  These will have to be fixed manually and an automated check might possibly (stretch goal?)  include checking for same or ”too similar” names while collecting the names for the docs page.

> Phase 1
> Situation:
> We support multiple different Metrics backends [0].  The two main ones are StatsD and OpenTelemetry.  This is managed though an interface class [1] which is implemented for each backend (examples:  StatsD[2] and OTel[3]).   StatsD was the only supported version well into Airflow 2.x and the entire codebase was designed with StatsD in mind so it was a good chunk of work to abstract it out and there are a few remaining tasks to perfect the new implementation.
> Task 1:
> StatsD has a name length limit of around 300 characters.  OTel limits names to 34 characters, but allows tagging.  Our temporary solution was to emit almost everything twice, once in the long format for StatsD and again in the short format with tags for OTel.  We also had to add code [4] to make sure the name is safe for OTel, and other hacks to make it work.
> The first task in this project is to understand the difference in how the two implementations handle their names and them add a "get_name" method to the interface: `def get_name(metric_name: str, tags: dict[str: str])`.  In the statsd_logger [2] implementation it will concatenate the tags onto the name and in the OTel implementation it will just return name.
> Once that is implemented, it can be used in the various emit methods (incr, decr, etc) instead of all the name validation code, and search the code for places where we are emitting things more than once and clean it up.
> Example:
> You can see an example in local_task_job_runner [5].  We emit `local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>` for StatsD but that results in a name too long for OTel so we also emit `local_task_job.task_exit`, and the name validation method [4] in the OTel implementation catches the one that is too long and just swallows it.  What we should do instead is pass incr() the name and the tags and let StatsD and OTel handle them accordingly.
> [0] https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#metric-descriptions
> [1] https://github.com/apache/airflow/blob/main/airflow/metrics/base_stats_logger.py
> [2] https://github.com/apache/airflow/blob/main/airflow/metrics/statsd_logger.py
> [3] https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py
> [4] https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py#L128
> [5] https://github.com/apache/airflow/blob/main/airflow/jobs/local_task_job_runner.py#L352


### Use case/motivation

From @ferruzzi:

> Currently when you add a new metric to the codebase, you must also manually update the docs page.  The docs page inevitably gets out of date and misses some details.  We want an automated system to generate the docs page based on the actual metrics.  There are also known instances where the same metric is being created and emitted in more than one place, causing duplicate data.  These will have to be fixed manually and an automated check might possibly (stretch goal?)  include checking for same or ”too similar” names while collecting the names for the docs page.

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics Improvement Project #42881

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics Improvement Project #42881

Description

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions