Skip to content

fix(big number with trendline): running 2 identical queries for no good reason #34296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Jul 24, 2025

Summary

Fixes duplicate query issue in BigNumber with Trendline charts and improves the View Query modal UX. PR #33407 introduced duplicate identical queries for most aggregation types, causing unnecessary database load and confusing query inspection.

tooltip for "Aggregation Method" is not more clear now:
Aggregation method applied across the values in the timeseries to compute the Big Number. Note that "None" uses server-side aggregation over the entire time period and is preferred for non-additive metrics like ratios, averages, distinct counts, etc.

before

Screenshot 2025-07-23 at 8 53 17 PM

after

Screenshot 2025-07-23 at 10 14 19 PM

Changes

Query Optimization:

  • All aggregations: Now use 1 query with client-side computation
  • None only: Preserves 2 queries (trendline + raw server aggregation)
  • Added client-side aggregation functions for sum, mean, min, max, median, LAST_VALUE

UI Improvements:

  • Wrapped each query in AntD Cards for better visual separation
  • Moved action buttons below SQL code with secondary styling
  • Improved spacing and layout using theme units
  • Enhanced aggregation tooltip explaining cross-timeseries behavior

Background

PR #33407 by @LevisNgigi always generated 2 queries regardless of aggregation type. Most aggregations produced identical or functionally equivalent queries since they can be computed from trendline data.

Query Logic Now

  • sum/mean/min/max/median/LAST_VALUE: 1 query with client-side aggregation from trendline
  • None only: 2 queries for trendline + raw metric over entire time period

Why None Still Needs 2 Queries

For non-additive metrics like COUNT(DISTINCT user_id):

  • Query 1: Daily counts [100, 120, 95] for trendline
  • Query 2: Total distinct users 250 across period (cannot be computed from daily counts)

Testing

  • Updated tests to match new single-query behavior
  • All aggregation methods produce correct results
  • View Query modal shows clean card layout

Significantly reduces database load while preserving correct behavior for the one case that legitimately needs 2 queries.

@dosubot dosubot bot added change:frontend Requires changing the frontend viz:charts:bignumber Related to BigNumber charts labels Jul 24, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Functionality Inconsistent Raw Metric Check ▹ view 🧠 Incorrect
Functionality Sum Aggregation Overflow Risk ▹ view 🧠 Not in scope
Performance Inefficient Median Calculation ▹ view 🧠 Not in scope
Design Inconsistent Styling Pattern ▹ view 🧠 Not in scope
Documentation Missing purpose in client-side aggregation function ▹ view 🧠 Incorrect
Functionality Median Calculation Overflow Risk ▹ view 🧠 Not in standard
Files scanned
File Path Reviewed
superset-frontend/src/explore/components/controls/ViewQueryModal.tsx
superset-frontend/plugins/plugin-chart-echarts/src/BigNumber/BigNumberWithTrendline/buildQuery.ts
superset-frontend/packages/superset-ui-core/src/components/UnsavedChangesModal/index.tsx
superset-frontend/packages/superset-ui-chart-controls/src/shared-controls/customControls.tsx
superset-frontend/src/explore/components/controls/ViewQuery.tsx
superset-frontend/plugins/plugin-chart-echarts/src/BigNumber/BigNumberWithTrendline/transformProps.ts

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@LevisNgigi
Copy link
Contributor

Thanks for the detailed breakdown! I hadn’t noticed that the previous implementation was generating two queries for all aggregations as I was working to achieve that for the None option. Good catch on optimizing that.

For client-side aggregation, I had initially gone that route as well, but later shifted to backend post-processing based on earlier feedback. That said, this new approach makes a lot of sense, especially with the performance improvements and reduced query load. Appreciate the clarity

@mistercrunch
Copy link
Member Author

The two queries were identical, and from my understanding aggregations for everything but None were done client-side. Is that what we intended? Is this PR right for sure?

@LevisNgigi
Copy link
Contributor

LevisNgigi commented Jul 24, 2025

The two queries were identical, and from my understanding aggregations for everything but None were done client-side. Is that what we intended? Is this PR right for sure?

Aggregations for everything was done server-side but for none we would just skip the aggregation similar to last value! I agree with the duplicate query issue.

@mistercrunch
Copy link
Member Author

Aggregations for everything was done server-side

but the query was wrong it seems, as one of the queries should have NOT included the time dimension (?)

@mistercrunch
Copy link
Member Author

I know there's a bit of history going back and forth on going client/server side, seems single-timeseries aggregation are well handled on the client side. I can see it either way on my side...

Also, making None more descriptive calling it Force server-side aggregation

@LevisNgigi
Copy link
Contributor

Aggregations for everything was done server-side

but the query was wrong it seems, as one of the queries should have NOT included the time dimension (?)

Yes there was no need for duplicate queries, previously for something like sum this was the query, [{'operation': 'pivot',
'options': {'aggregates': {'count': {'operator': 'mean'}},
'columns': [],
'drop_missing_columns': True,
'index': ['order_date']}},
{'operation': 'aggregate',
'options': {'aggregates': {'count': {'column': 'count', 'operator': 'sum'}},
'groupby': []}}]
but with client side it is this [{'operation': 'pivot',
'options': {'aggregates': {'count': {'operator': 'mean'}},
'columns': [],
'drop_missing_columns': True,
'index': ['order_date']}},
{'operation': 'flatten'}] but even client processing works fine as well.

Copy link
Contributor

@mistercrunch Processing your ephemeral environment request here. Action: up. More information on how to use or configure ephemeral environments

Copy link
Contributor

@mistercrunch Ephemeral environment spinning up at http://54.191.189.47:8080. Credentials are 'admin'/'admin'. Please allow several minutes for bootstrapping and startup.

// Aggregation choices with computation methods for plugins and controls
export const aggregationChoices = {
raw: {
label: 'Force server-side aggregation',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is clearer to the user... "server-side" is an implementation detail.

My understanding is that when this option is selected we ignore the time grain when computing the big number. If the grain is monthly, and the metric is AVG(price), we would have:

  • The "trendline" (which is not a trendline) would show a timeseries of AVG(price) per month.
  • The big number would show the overall AVG(price), from start to end of the data.

So I think renaming this to "Overall value" would make it clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah though call on naming this properly... clearly we can expand on the tooltip and put much more information and can be a bit more technical there too. The feature is funky for a variety of reasons, one is that you have to pick 2 aggregations (one one the time grain, and one across time - sometimes across the raw data, sometimes on the time series itself), and many combinations might not make sense. LAST_VALUE is the most intuitive to me (and the default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants