Skip to content

Support distribution as a MetricValue in ExecutionPlan #16044

Closed
@sfluor

Description

@sfluor

Is your feature request related to a problem or challenge?

The MetricValue enum currently exposes only single-value statistics: counts, gauges, timers, timestamps, and a few hard-coded variants such as SpillCount or OutputRows.
For many operational questions we really care about the shape of a metric’s distribution (e.g. What is the p99 elapsed-compute time?, How skewed is memory usage across partitions?).

This is especially true when the ExecutionPlan is dispatched to multiple nodes / workers in a distributed system as part of multiple requests..

Because there is no “distribution” metric type right now we can only track very simple metrics such as (avg / min / max).

This makes it hard to pin-point outliers in terms of latencies or memory usage.

Describe the solution you'd like

Adding a new Distribution type to the list of MetricValues.

That would look like:

    Distribution {
        /// The provided name of this metric
        name: Cow<'static, str>,
        /// A custom implementation of the metric value.
        value: Arc<Mutex<TDigest>,
    },

Describe alternatives you've considered

An alternative would be to expose something more generic to allow everyone to define their own ways of accumulating metrics throughout the plan execution:

    Custom {
        /// The provided name of this metric
        name: Cow<'static, str>,
        /// A custom implementation of the metric value.
        value: Arc<dyn CustomMetricValue>,
    },
}

trait CustomMetricValue: Debug + Send + Sync {
    fn new_empty(self: Arc<Self>) -> Arc<dyn CustomMetricValue>;

    fn aggregate(
        self: Arc<Self>,
        other: &dyn CustomMetricValue,
    ) -> Arc<dyn CustomMetricValue>;
}

This would allow to have more complex aggregations of metrics. For instance in the context of an execution plan issuing multiple requests, we could track the 5 slowest requests with their metadata.

Additional context

Happy to draft a PR if you think this would fit the Metric model and would be a nice addition.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions