approx_percentile_cont_with_weight produces incorrect results due to wrong weight handling in TDigest

### Describe the bug

The approx_percentile_cont_with_weight function produces incorrect weighted percentile results. The bug is in the TDigest implementation where new_with_centroid() sets count: 1 regardless of the actual centroid weight, while the weight is used elsewhere in centroid merging. This mismatch corrupts the percentile calculation.

**Root cause**

In datafusion/functions-aggregate-common/src/tdigest.rs, the new_with_centroid function incorrectly hardcodes count: 1:
pub fn new_with_centroid(max_size: usize, centroid: Centroid) -> Self {
    TDigest {
        centroids: vec![centroid.clone()],
        max_size,
        sum: centroid.mean * centroid.weight,
        count: 1,  // BUG: should be centroid.weight
        max: centroid.mean,
        min: centroid.mean,
    }
}
The count field should reflect the actual weight of the centroid, as it's used in estimate_quantile() to compute the rank: let rank = q * self.count. Since the weight is correctly used in sum but not in count, the algorithm produces inconsistent results.
Additionally, count should be f64 (not u64) to properly support fractional weights, consistent with the ClickHouse TDigest implementation (https://github.com/ClickHouse/ClickHouse/blob/927af1255adb37ace1b95cc3ec4316553b4cb4b4/src/AggregateFunctions/QuantileTDigest.h#L71-L87).

### To Reproduce

-- Weighted percentile with increasing weights (1,2,3,...,9)
-- Higher values have higher weights, so weighted median should be > 5
SELECT approx_percentile_cont_with_weight(value, weight, 0.5) as weighted
FROM (VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6,6), (7,7), (8,8), (9,9)) AS t(value, weight);
-- Compare with unweighted percentile
SELECT approx_percentile_cont(value, 0.5) as unweighted
FROM (VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6,6), (7,7), (8,8), (9,9)) AS t(value, weight);

**Actual Behavior**

weighted:   3
unweighted: 5
The weighted result (3) is lower than the unweighted median (5), even though higher values have proportionally higher weights. The result is neither correct weighted percentile nor correct unweighted percentile.

### Expected behavior

weighted:   6 or 7
unweighted: 5
With weights (1,2,3,4,5,6,7,8,9):
- Total weight = 45
- At 0.5 quantile, rank = 22.5
- Cumulative weights: 1→1, 2→3, 3→6, 4→10, 5→15, 6→21, 7→28
- Rank 22.5 falls between value 6 (cumulative 21) and value 7 (cumulative 28)
- Expected weighted median ≈ 6-7

The weighted median should be pulled toward higher values since they have higher weights.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

approx_percentile_cont_with_weight produces incorrect results due to wrong weight handling in TDigest #19940

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

approx_percentile_cont_with_weight produces incorrect results due to wrong weight handling in TDigest #19940

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions