-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print cost metrics as data size #11443
Conversation
More or less cost is the amount of data to be processed by cpu, sent through network, stored in memory. Displaying them in data size units makes it more readable. Especially when comparing big numbers. |
In case of CPU that would rather be something like number of cycles (e.g: artificial unit of measure representing how heavy computation is). That it is currently more or less amount of data processed is a limitation of the model. We should put different weights on different operations (e.g: hash computation, hash lookup, etc).. @rschlussel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with the change, but mind @sopel39 's comment.
} | ||
|
||
return "?"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
Output example:
|
Currently (without using units) the above example looks like:
Thanks to this PR the EXPLAIN output is much more readable. |
if (value == Double.NEGATIVE_INFINITY) { | ||
return "-INF"; | ||
} | ||
else if (value == Double.POSITIVE_INFINITY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant else
(here & below)
private static String formatDoubleAsCpuCost(double value) | ||
{ | ||
if (value == Double.NEGATIVE_INFINITY) { | ||
return "-INF"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why uppercase?
FWIW, Double#toString outputs -Infinity
/ Infinity
else if (value == Double.POSITIVE_INFINITY) { | ||
return "+INF"; | ||
} | ||
else if(!isNaN(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert if condition -- NaN is yet another special case, like +inf, -inf, so layout the conditions for these cases similarily
else if(!isNaN(value)) { | ||
String formattedValue = DataSize.succinctDataSize(value, BYTE).toString(); | ||
// strip last character `B` to not to bound cpu cost with data size | ||
return formattedValue.substring(0, formattedValue.length() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to say "strip trailing be", formattedValue.replaceAll("b$", "")
would be more direct way of saying that.
return "?"; | ||
} | ||
|
||
private static String formatDoubleAsDataSize(double value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all comments from formatDoubleAsCpuCost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- % prev comments
- squash commits
I don’t think using data size is correct for CPU cost. Billions should be “B” not “G”. See how we print row count in the CLI. |
Why not use a Duration for cpu cost. It’s a measurement of time. |
Estimated CPU cost is not time. Currently, it's (very roughly) the amount of data being processed. |
That’s very misleading, then. CPU cost should, ideally, be a an estimation of the amount of CPU (as measured by cpu timers) the query will use. If the current metric is an indication of something else, we should come up with a different name for it. |
@findepi, CPU time is no longer an estimate. With the changes @arhimondr and I made, we get an actual measurement per operator. So, I think time is the right measure. |
@dain i understood this as being about "CPU cost estimate" computed by CBO during planning rather than "CPU time measurement" as measured (or perviously approximated) during execution. |
@dain which changes you refer to? |
@sopel39 I think this is the core PR #11408, but there were a few more followup ones. @findepi I see, I thought we were talking about the actual measurements. What is the CBO CPU part actually estimating? Specifically, is it estimating the actual CPU time in the current cluster, or is it more of a estimate in a "model" cluster? |
@dain currently this is an "abstract CPU cost". Hence the units are not time/ticks. I am for the change @kokosing is proposing. It's trivial and should help reading EXPLAINS today. Can we merge this as is then? |
My comment about the prefix has not been addressed. It’s only a number, so we should use “B” for billion, not “G”. Metric prefixes only make sense for a unit of some type. |
@electrum I am going to address your comment, by fixing airlift/units#7 |
Related (dependency) PR: airlift/units#8 |
@kokosing Grzegorz, I can't look right now, because I'm over-booked. I'm on-call and I'm also finishing the FB-specific parts of the 0.216 release. I'll look into this next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
- nit: how about we update the commit title as
Update the format of costs in plan output to have units
to be more specific about the change? - Is this blocked on Introduce Count unit airlift/units#8 as commented in Print cost metrics in units trinodb/trino#68? Looks like it's not.
return "?"; | ||
} | ||
|
||
return DataSize.succinctDataSize(value, BYTE).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- static import
succinctDataSize
- you can also use
succinctBytes((long) value)
@@ -476,6 +478,23 @@ private void printWindowOperatorStats(int indent, WindowOperatorStats stats) | |||
output.append('\n'); | |||
} | |||
|
|||
private static String formatDoubleAsCpuCost(double value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply rename these methods as formatCpuCost
and formatDataSize
, because the parameters are double
so we don't need to repeat that we are formatting double values.
if (!isFinite(value)) { | ||
return Double.toString(value); | ||
} | ||
else if (isNaN(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary else
Print cost metrics as data size