Print cost metrics as data size #11443

kokosing · 2018-09-07T06:02:19Z

Print cost metrics as data size

kokosing · 2018-09-07T06:03:38Z

More or less cost is the amount of data to be processed by cpu, sent through network, stored in memory. Displaying them in data size units makes it more readable. Especially when comparing big numbers.

sopel39 · 2018-09-07T08:03:43Z

More or less cost is the amount of data to be processed by cpu

In case of CPU that would rather be something like number of cycles (e.g: artificial unit of measure representing how heavy computation is). That it is currently more or less amount of data processed is a limitation of the model. We should put different weights on different operations (e.g: hash computation, hash lookup, etc).. @rschlussel

findepi

I am OK with the change, but mind @sopel39 's comment.

findepi · 2018-09-07T10:01:57Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        }
+
+        return "?";
+


presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

kokosing · 2018-10-29T05:37:53Z

Output example:

     - CrossJoin => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152), orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:i
             Distribution: REPLICATED
             Cost: {rows: 150030375 (33.53GB), cpu: 34.99G, memory: 747.67MB, network: 747.67MB}
         - TableScan[tpch:tpch:nation:sf1.0, grouped = false] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
                 Cost: {rows: 25 (2.67kB), cpu: 2.67k, memory: 0B, network: 0B}
                 nationkey := tpch:nationkey
                 regionkey := tpch:regionkey
                 name := tpch:name
                 comment := tpch:comment
         - LocalExchange[SINGLE] () => orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:do
                 Cost: {rows: 6001215 (747.67MB), cpu: 747.67M, memory: 0B, network: 747.67MB}
             - RemoteSource[2] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double
                     Cost: {rows: 6001215 (747.67MB), cpu: 747.67M, memory: 0B, network: 747.67MB}

kokosing · 2018-10-29T05:40:26Z

Currently (without using units) the above example looks like:

     - CrossJoin => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152), orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:i
             Distribution: REPLICATED
             Cost: {rows: 150030375 (33.53GB), cpu: 37575027902.00, memory: 783988912.00, network: 783988912.00}
         - TableScan[tpch:tpch:nation:sf1.0, grouped = false] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
                 Cost: {rows: 25 (2.67kB), cpu: 2734.00, memory: 0.00, network: 0.00}
                 nationkey := tpch:nationkey
                 regionkey := tpch:regionkey
                 name := tpch:name
                 comment := tpch:comment
         - LocalExchange[SINGLE] () => orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:do
                 Cost: {rows: 6001215 (747.67MB), cpu: 783988912.00, memory: 0.00, network: 783988912.00}
             - RemoteSource[2] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double
                     Cost: {rows: 6001215 (747.67MB), cpu: 783988912.00, memory: 0.00, network: 783988912.00}

Thanks to this PR the EXPLAIN output is much more readable.

findepi · 2018-10-29T08:16:32Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        if (value == Double.NEGATIVE_INFINITY) {
+            return "-INF";
+        }
+        else if (value == Double.POSITIVE_INFINITY) {


redundant else (here & below)

findepi · 2018-10-29T08:16:37Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+    private static String formatDoubleAsCpuCost(double value)
+    {
+        if (value == Double.NEGATIVE_INFINITY) {
+            return "-INF";


why uppercase?

FWIW, Double#toString outputs -Infinity / Infinity

findepi · 2018-10-29T08:19:39Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        else if (value == Double.POSITIVE_INFINITY) {
+            return "+INF";
+        }
+        else if(!isNaN(value)) {


revert if condition -- NaN is yet another special case, like +inf, -inf, so layout the conditions for these cases similarily

findepi · 2018-10-29T08:20:52Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        else if(!isNaN(value)) {
+            String formattedValue = DataSize.succinctDataSize(value, BYTE).toString();
+            // strip last character `B` to not to bound cpu cost with data size
+            return formattedValue.substring(0, formattedValue.length() - 1);


If you want to say "strip trailing be", formattedValue.replaceAll("b$", "") would be more direct way of saying that.

findepi · 2018-10-29T08:21:25Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        return "?";
+    }
+
+    private static String formatDoubleAsDataSize(double value)


all comments from formatDoubleAsCpuCost

findepi

% prev comments
squash commits

electrum · 2018-10-29T14:18:44Z

I don’t think using data size is correct for CPU cost. Billions should be “B” not “G”. See how we print row count in the CLI.

martint · 2018-10-29T14:27:57Z

Why not use a Duration for cpu cost. It’s a measurement of time.

findepi · 2018-10-29T16:23:40Z

Estimated CPU cost is not time. Currently, it's (very roughly) the amount of data being processed.

martint · 2018-10-29T16:36:29Z

That’s very misleading, then. CPU cost should, ideally, be a an estimation of the amount of CPU (as measured by cpu timers) the query will use.

If the current metric is an indication of something else, we should come up with a different name for it.

dain · 2018-10-29T18:52:02Z

@findepi, CPU time is no longer an estimate. With the changes @arhimondr and I made, we get an actual measurement per operator. So, I think time is the right measure.

findepi · 2018-10-29T18:59:25Z

@dain i understood this as being about "CPU cost estimate" computed by CBO during planning rather than "CPU time measurement" as measured (or perviously approximated) during execution.

sopel39 · 2018-10-29T19:01:18Z

@dain which changes you refer to?

dain · 2018-10-29T19:25:56Z

@sopel39 I think this is the core PR #11408, but there were a few more followup ones.

@findepi I see, I thought we were talking about the actual measurements. What is the CBO CPU part actually estimating? Specifically, is it estimating the actual CPU time in the current cluster, or is it more of a estimate in a "model" cluster?

findepi · 2018-10-29T19:38:53Z

@dain currently this is an "abstract CPU cost". Hence the units are not time/ticks.
There is plan to make it more "material" (#11615 > "Adjusting model").
(This is not that trivial, since then it becomes hard (if not impossible) to compare different cost dimensions.)

I am for the change @kokosing is proposing. It's trivial and should help reading EXPLAINS today.
As soon we make CPU cost something different, closer to the actual thing it's estimating, we can very easily take back this change or replace it with something that is appropriate (eg. Duration).

Can we merge this as is then?

dain · 2018-10-29T20:29:32Z

@findepi, I'm ok with what every you all agree on. @martint or @electrum, can you follow up on this one?

electrum · 2018-10-30T05:41:22Z

My comment about the prefix has not been addressed. It’s only a number, so we should use “B” for billion, not “G”. Metric prefixes only make sense for a unit of some type.

kokosing · 2018-10-30T05:52:33Z

@electrum I am going to address your comment, by fixing airlift/units#7

kokosing · 2018-11-05T06:21:00Z

Related (dependency) PR: airlift/units#8

kokosing · 2019-01-25T12:29:04Z

@nezihyigitbasi @mbasmanova ping

mbasmanova · 2019-01-25T15:32:15Z

@kokosing Grzegorz, I can't look right now, because I'm over-booked. I'm on-call and I'm also finishing the FB-specific parts of the 0.216 release. I'll look into this next week.

nezihyigitbasi

LGTM.

nit: how about we update the commit title as Update the format of costs in plan output to have units to be more specific about the change?
Is this blocked on Introduce Count unit airlift/units#8 as commented in Print cost metrics in units trinodb/trino#68? Looks like it's not.

nezihyigitbasi · 2019-01-27T05:25:58Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+            return "?";
+        }
+
+        return DataSize.succinctDataSize(value, BYTE).toString();


static import succinctDataSize

you can also use succinctBytes((long) value)

nezihyigitbasi · 2019-01-27T05:34:50Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

@@ -476,6 +478,23 @@ private void printWindowOperatorStats(int indent, WindowOperatorStats stats)
        output.append('\n');
    }

+    private static String formatDoubleAsCpuCost(double value)


I think we can simply rename these methods as formatCpuCost and formatDataSize, because the parameters are double so we don't need to repeat that we are formatting double values.

nezihyigitbasi · 2019-01-27T05:35:40Z

presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java

+        if (!isFinite(value)) {
+            return Double.toString(value);
+        }
+        else if (isNaN(value)) {


unnecessary else

facebook-github-bot added the CLA Signed label Sep 7, 2018

findepi reviewed Sep 7, 2018

View reviewed changes

kokosing mentioned this pull request Oct 29, 2018

Make units extensible airlift/units#7

Closed

findepi reviewed Oct 29, 2018

View reviewed changes

findepi approved these changes Oct 29, 2018

View reviewed changes

Print cost metrics in units

f10ca41

findepi mentioned this pull request Nov 18, 2018

Optionally use default filter factor to estimate filter node #11904

Merged

kokosing mentioned this pull request Jan 4, 2019

Introduce Count unit airlift/units#8

Open

kokosing mentioned this pull request Jan 25, 2019

Print cost metrics in units trinodb/trino#68

Merged

nezihyigitbasi approved these changes Jan 27, 2019

View reviewed changes

kokosing closed this Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print cost metrics as data size #11443

Print cost metrics as data size #11443

kokosing commented Sep 7, 2018

kokosing commented Sep 7, 2018

sopel39 commented Sep 7, 2018 •

edited

Loading

findepi left a comment

findepi Sep 7, 2018

kokosing commented Oct 29, 2018

kokosing commented Oct 29, 2018

findepi Oct 29, 2018

findepi Oct 29, 2018

findepi Oct 29, 2018

findepi Oct 29, 2018

findepi Oct 29, 2018

findepi left a comment

electrum commented Oct 29, 2018

martint commented Oct 29, 2018

findepi commented Oct 29, 2018

martint commented Oct 29, 2018

dain commented Oct 29, 2018

findepi commented Oct 29, 2018

sopel39 commented Oct 29, 2018

dain commented Oct 29, 2018

findepi commented Oct 29, 2018

dain commented Oct 29, 2018

electrum commented Oct 30, 2018

kokosing commented Oct 30, 2018

kokosing commented Nov 5, 2018

kokosing commented Jan 25, 2019

mbasmanova commented Jan 25, 2019

nezihyigitbasi left a comment

nezihyigitbasi Jan 27, 2019

nezihyigitbasi Jan 27, 2019

nezihyigitbasi Jan 27, 2019

Print cost metrics as data size #11443

Print cost metrics as data size #11443

Conversation

kokosing commented Sep 7, 2018

kokosing commented Sep 7, 2018

sopel39 commented Sep 7, 2018 • edited Loading

findepi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented Oct 29, 2018

kokosing commented Oct 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi left a comment

Choose a reason for hiding this comment

electrum commented Oct 29, 2018

martint commented Oct 29, 2018

findepi commented Oct 29, 2018

martint commented Oct 29, 2018

dain commented Oct 29, 2018

findepi commented Oct 29, 2018

sopel39 commented Oct 29, 2018

dain commented Oct 29, 2018

findepi commented Oct 29, 2018

dain commented Oct 29, 2018

electrum commented Oct 30, 2018

kokosing commented Oct 30, 2018

kokosing commented Nov 5, 2018

kokosing commented Jan 25, 2019

mbasmanova commented Jan 25, 2019

nezihyigitbasi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Sep 7, 2018 •

edited

Loading