Skip to content

feat: add metadata to literal expressions #16170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 6, 2025

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This is an alternative to #16053

In this version, we change the enum variant of Literal in both logical and physical expressions to contain the optional metadata. This is a deeper change than #16053 but it does make an easier to follow data flow path. Rather than relying on complex checking on the schema during simplification and creation of the physical plans, it adds the schema to the logical expression.

What changes are included in this PR?

  • Add optional metadata to literal variant of logical expression
  • Add optional metadata to literal variant of physical expression
  • Update optimizer to combine metadata on aliases

Are these changes tested?

Tested with unit tests and add additional unit test that specifically checks for metadata on the input and output of scalar values.

Are there any user-facing changes?

The biggest change is that users who check literal values will need to update their enum variants to take and pass the additional metadata field.

@timsaucer timsaucer self-assigned this May 23, 2025
@timsaucer timsaucer added the api change Changes the API exposed to users of the crate label May 23, 2025
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate catalog Related to the catalog crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate labels May 23, 2025
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) documentation Improvements or additions to documentation labels May 27, 2025
@timsaucer timsaucer force-pushed the feat/metadata-on-logical-literal branch from 273038e to 124ea41 Compare May 29, 2025 11:27
@timsaucer timsaucer marked this pull request as ready for review June 3, 2025 16:48
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I have a few questions but in general I like this approach. I'm relatively new to the internals, but elevating the metadata into the enum seems to have done a better job of exposing the places it might have otherwise been ignored.

Comment on lines +286 to +288
/// A constant value along with associated metadata
Literal(ScalarValue, Option<BTreeMap<String, String>>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious on the choice of a BTreeMap here (isn't Field using a HashMap?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for programmer's convenience. BTreeMap will derive things like Debug and Hash which aren't implemented on HashMap since HashMap is not guaranteed to have specific ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was personally confused by it (and I believe it's why the conversions I pointed out below can't just be .clone()). If the HashMap is not convenient for the metadata perhaps that's better done consistently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see...if you make this a HashMap the #[derive(Debug)] on the enum no longer works out of the box. I don't have a strong opinion about this one!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend:

  1. Adding a comment about why BTree is used to capture the context of this PR
  2. Change the metadata to be Arc'd so that Cloneing literals with metadata is not expensive.

I don't think this is a blocker for this PR, but do think it is important for the long term. Thus if we merge this API in for 48.0.0 I think we should be prepared to make a breaking API change for 49.0.0

I will try and whip up a prototype of what I am talking about

Comment on lines +427 to +430
metadata
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does metadata.clone() or metadata.into() not work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Described in below comment

Comment on lines +29 to +32
pub fn lit_with_metadata<T: Literal>(
n: T,
metadata: impl Into<Option<HashMap<String, String>>>,
) -> Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@@ -342,15 +342,15 @@ mod test {
)
.unwrap();
let snap = dynamic_filter_1.snapshot().unwrap().unwrap();
insta::assert_snapshot!(format!("{snap:?}"), @r#"BinaryExpr { left: Column { name: "a", index: 0 }, op: Eq, right: Literal { value: Int32(42) }, fail_on_overflow: false }"#);
insta::assert_snapshot!(format!("{snap:?}"), @r#"BinaryExpr { left: Column { name: "a", index: 0 }, op: Eq, right: Literal { value: Int32(42), field: Field { name: "42", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, fail_on_overflow: false }"#);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is Display or Debug, but would it be possible to omit the Field if it isn't carrying extra metadata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Debug output from format!("{snap:?}")

Comment on lines +117 to +124
let mut new_metadata = prior_metadata
.as_ref()
.map(|m| {
m.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect::<HashMap<String, String>>()
})
.unwrap_or_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's some reason that .cloned() doesn't work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you try doing that there is some trait bound that is unsatisfied that I didn't spend any more time trying to dig through.

@@ -6061,7 +6061,7 @@ physical_plan
04)------AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
05)--------ProjectionExec: expr=[]
06)----------CoalesceBatchesExec: target_batch_size=8192
07)------------FilterExec: substr(md5(CAST(value@0 AS Utf8)), 1, 32) IN ([Literal { value: Utf8View("7f4b18de3cfeb9b4ac78c381ee2ad278") }, Literal { value: Utf8View("a") }, Literal { value: Utf8View("b") }, Literal { value: Utf8View("c") }])
07)------------FilterExec: substr(md5(CAST(value@0 AS Utf8)), 1, 32) IN ([Literal { value: Utf8View("7f4b18de3cfeb9b4ac78c381ee2ad278"), field: Field { name: "7f4b18de3cfeb9b4ac78c381ee2ad278", data_type: Utf8View, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8View("a"), field: Field { name: "a", data_type: Utf8View, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8View("b"), field: Field { name: "b", data_type: Utf8View, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8View("c"), field: Field { name: "c", data_type: Utf8View, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those of us looking at physical plans all the time, I do think it might be worth the effort to make the Literal output more succinct (perhaps just the metadata part?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this on Display or on Debug? My gut reaction is that we just want Debug as is. Right now on Display I didn't change anything, but I could enhance it to include metadata when it exists. Since you're the first user, what are you looking for here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which one shows up when you EXPLAIN, but this does seem to affect the readability of that output? (Totally optional, can leave as a follow up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this in right now. How would you like this to look? Suppose I have

    let metadata: HashMap<_, _> = [("key1", "value1")]
        .iter()
        .map(|(k, v)| (k.to_string(), v.to_string()))
        .collect();
    let expr = lit_with_metadata(4.0, Some(metadata));

This expression shows up in the logical plan of explain as Float64(4) and in the physical plan as 4 as Float64(4). How would you like that metadata shown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this look ok?

Float64(4) {"key1": "value1"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may play havoc with conversion to SQL but honestly I don't know how we'd handle metadata as SQL myself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great! (I don't know what the implications are here, just a passing comment on impacting readability for the ubiquitous non-metadata case)

I don't know how we'd handle metadata as SQL myself

Not today, but rendering it as a function call may work (i.e., WithMetadata('...'))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a check locally to make sure that EXPLAIN output for normal queries isn't affected by this change and it isn't. Thanks!

@timsaucer timsaucer force-pushed the feat/metadata-on-logical-literal branch from 124ea41 to fb1e0ac Compare June 4, 2025 19:47
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timsaucer.

Note for other reviewers - although this looks like a large PR, most of the changes are one of:

  • update pattern matching to accommodate extra field
  • propagating the new metadata
  • updating tests assertions due to the Debug output including the new field

@andygrove
Copy link
Member

@alamb @xudong963 I think that we can include this in the next DF 48 rc

@alamb
Copy link
Contributor

alamb commented Jun 6, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/metadata-on-logical-literal (5541492) to 85f6621 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this change doesn't cause any performance regressions (I started some benchmarks running) I am also ok with merging this PR

If we want to include it in the 48.0.0 I recommend we:

  1. merge this PR to main
  2. Create a backport PR that cherry-pick's the change to the branch-48 branch

Comment on lines +286 to +288
/// A constant value along with associated metadata
Literal(ScalarValue, Option<BTreeMap<String, String>>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend:

  1. Adding a comment about why BTree is used to capture the context of this PR
  2. Change the metadata to be Arc'd so that Cloneing literals with metadata is not expensive.

I don't think this is a blocker for this PR, but do think it is important for the long term. Thus if we merge this API in for 48.0.0 I think we should be prepared to make a breaking API change for 49.0.0

I will try and whip up a prototype of what I am talking about

@alamb
Copy link
Contributor

alamb commented Jun 6, 2025

🤖: Benchmark completed

Details

Comparing HEAD and feat_metadata-on-logical-literal
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ feat_metadata-on-logical-literal ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1947.84 ms │                       1977.34 ms │    no change │
│ QQuery 1     │   697.51 ms │                        732.40 ms │ 1.05x slower │
│ QQuery 2     │  1435.42 ms │                       1473.46 ms │    no change │
│ QQuery 3     │   686.36 ms │                        680.00 ms │    no change │
│ QQuery 4     │  1460.58 ms │                       1491.70 ms │    no change │
│ QQuery 5     │ 15570.48 ms │                      15441.63 ms │    no change │
│ QQuery 6     │  1995.58 ms │                       2100.00 ms │ 1.05x slower │
│ QQuery 7     │  2063.07 ms │                       2023.71 ms │    no change │
│ QQuery 8     │   828.62 ms │                        838.92 ms │    no change │
└──────────────┴─────────────┴──────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 26685.45ms │
│ Total Time (feat_metadata-on-logical-literal)   │ 26759.16ms │
│ Average Time (HEAD)                             │  2965.05ms │
│ Average Time (feat_metadata-on-logical-literal) │  2973.24ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          2 │
│ Queries with No Change                          │          7 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ feat_metadata-on-logical-literal ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │    15.27 ms │                         15.17 ms │    no change │
│ QQuery 1     │    31.75 ms │                         32.97 ms │    no change │
│ QQuery 2     │    79.99 ms │                         80.41 ms │    no change │
│ QQuery 3     │    95.96 ms │                         97.03 ms │    no change │
│ QQuery 4     │   581.96 ms │                        586.26 ms │    no change │
│ QQuery 5     │   815.93 ms │                        835.78 ms │    no change │
│ QQuery 6     │    23.44 ms │                         23.19 ms │    no change │
│ QQuery 7     │    38.16 ms │                         36.92 ms │    no change │
│ QQuery 8     │   898.57 ms │                        900.58 ms │    no change │
│ QQuery 9     │  1169.91 ms │                       1173.72 ms │    no change │
│ QQuery 10    │   262.73 ms │                        272.99 ms │    no change │
│ QQuery 11    │   297.50 ms │                        305.42 ms │    no change │
│ QQuery 12    │   875.76 ms │                        908.23 ms │    no change │
│ QQuery 13    │  1226.00 ms │                       1271.09 ms │    no change │
│ QQuery 14    │   810.51 ms │                        846.11 ms │    no change │
│ QQuery 15    │   802.86 ms │                        818.68 ms │    no change │
│ QQuery 16    │  1729.26 ms │                       1726.44 ms │    no change │
│ QQuery 17    │  1660.24 ms │                       1609.25 ms │    no change │
│ QQuery 18    │  3100.18 ms │                       3034.41 ms │    no change │
│ QQuery 19    │    83.56 ms │                         83.33 ms │    no change │
│ QQuery 20    │  1108.98 ms │                       1159.42 ms │    no change │
│ QQuery 21    │  1297.61 ms │                       1364.13 ms │ 1.05x slower │
│ QQuery 22    │  2128.22 ms │                       2254.26 ms │ 1.06x slower │
│ QQuery 23    │  7925.60 ms │                       8277.82 ms │    no change │
│ QQuery 24    │   458.42 ms │                        477.99 ms │    no change │
│ QQuery 25    │   384.51 ms │                        410.40 ms │ 1.07x slower │
│ QQuery 26    │   556.90 ms │                        540.30 ms │    no change │
│ QQuery 27    │  1545.90 ms │                       1631.50 ms │ 1.06x slower │
│ QQuery 28    │ 12411.77 ms │                      12611.76 ms │    no change │
│ QQuery 29    │   532.50 ms │                        523.77 ms │    no change │
│ QQuery 30    │   782.06 ms │                        819.37 ms │    no change │
│ QQuery 31    │   826.92 ms │                        880.32 ms │ 1.06x slower │
│ QQuery 32    │  2694.78 ms │                       2675.31 ms │    no change │
│ QQuery 33    │  3332.02 ms │                       3358.09 ms │    no change │
│ QQuery 34    │  3367.47 ms │                       3372.69 ms │    no change │
│ QQuery 35    │  1270.93 ms │                       1281.59 ms │    no change │
│ QQuery 36    │   125.65 ms │                        123.75 ms │    no change │
│ QQuery 37    │    55.14 ms │                         57.00 ms │    no change │
│ QQuery 38    │   126.59 ms │                        128.04 ms │    no change │
│ QQuery 39    │   197.26 ms │                        205.22 ms │    no change │
│ QQuery 40    │    45.76 ms │                         46.50 ms │    no change │
│ QQuery 41    │    43.23 ms │                         47.35 ms │ 1.10x slower │
│ QQuery 42    │    37.80 ms │                         37.72 ms │    no change │
└──────────────┴─────────────┴──────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 55855.54ms │
│ Total Time (feat_metadata-on-logical-literal)   │ 56942.28ms │
│ Average Time (HEAD)                             │  1298.97ms │
│ Average Time (feat_metadata-on-logical-literal) │  1324.24ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          6 │
│ Queries with No Change                          │         37 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ feat_metadata-on-logical-literal ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 118.98 ms │                        121.08 ms │    no change │
│ QQuery 2     │  22.23 ms │                         21.31 ms │    no change │
│ QQuery 3     │  33.70 ms │                         32.96 ms │    no change │
│ QQuery 4     │  19.92 ms │                         19.64 ms │    no change │
│ QQuery 5     │  52.01 ms │                         51.56 ms │    no change │
│ QQuery 6     │  11.93 ms │                         11.86 ms │    no change │
│ QQuery 7     │  96.23 ms │                         98.36 ms │    no change │
│ QQuery 8     │  25.64 ms │                         25.53 ms │    no change │
│ QQuery 9     │  58.07 ms │                         61.09 ms │ 1.05x slower │
│ QQuery 10    │  47.96 ms │                         48.36 ms │    no change │
│ QQuery 11    │  11.11 ms │                         11.36 ms │    no change │
│ QQuery 12    │  40.83 ms │                         41.79 ms │    no change │
│ QQuery 13    │  27.43 ms │                         26.76 ms │    no change │
│ QQuery 14    │   9.87 ms │                          9.52 ms │    no change │
│ QQuery 15    │  22.73 ms │                         22.64 ms │    no change │
│ QQuery 16    │  21.77 ms │                         20.75 ms │    no change │
│ QQuery 17    │  95.11 ms │                         95.56 ms │    no change │
│ QQuery 18    │ 211.93 ms │                        207.00 ms │    no change │
│ QQuery 19    │  25.64 ms │                         28.03 ms │ 1.09x slower │
│ QQuery 20    │  34.67 ms │                         35.15 ms │    no change │
│ QQuery 21    │ 157.01 ms │                        157.52 ms │    no change │
│ QQuery 22    │  16.28 ms │                         16.59 ms │    no change │
└──────────────┴───────────┴──────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 1161.05ms │
│ Total Time (feat_metadata-on-logical-literal)   │ 1164.40ms │
│ Average Time (HEAD)                             │   52.77ms │
│ Average Time (feat_metadata-on-logical-literal) │   52.93ms │
│ Queries Faster                                  │         0 │
│ Queries Slower                                  │         2 │
│ Queries with No Change                          │        20 │
│ Queries with Failure                            │         0 │
└─────────────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jun 6, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/metadata-on-logical-literal (5541492) to 85f6621 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_metadata-on-logical-literal
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 6, 2025

I will try and whip up a prototype of what I am talking about

Here is a proposed PR:

@alamb
Copy link
Contributor

alamb commented Jun 6, 2025

🤖: Benchmark completed

Details

group                                         feat_metadata-on-logical-literal       main
-----                                         --------------------------------       ----
logical_aggregate_with_join                   1.00    730.3±6.95µs        ? ?/sec    1.00    728.3±3.68µs        ? ?/sec
logical_select_all_from_1000                  1.00    122.6±0.46ms        ? ?/sec    1.02    125.4±0.21ms        ? ?/sec
logical_select_one_from_700                   1.00    417.1±1.80µs        ? ?/sec    1.00    416.6±1.50µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    378.9±4.31µs        ? ?/sec    1.00    380.3±7.66µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    364.8±2.74µs        ? ?/sec    1.00    365.5±1.72µs        ? ?/sec
physical_intersection                         1.00    835.0±3.73µs        ? ?/sec    1.00    836.0±4.55µs        ? ?/sec
physical_join_consider_sort                   1.00   1357.9±6.79µs        ? ?/sec    1.00   1357.5±5.13µs        ? ?/sec
physical_join_distinct                        1.00    353.6±1.27µs        ? ?/sec    1.01    356.6±2.00µs        ? ?/sec
physical_many_self_joins                      1.00     10.2±0.03ms        ? ?/sec    1.00     10.2±0.04ms        ? ?/sec
physical_plan_clickbench_all                  1.01    142.6±2.03ms        ? ?/sec    1.00    141.1±0.93ms        ? ?/sec
physical_plan_clickbench_q1                   1.00  1671.3±17.97µs        ? ?/sec    1.01  1687.6±16.43µs        ? ?/sec
physical_plan_clickbench_q10                  1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      2.5±0.02ms        ? ?/sec    1.01      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      2.6±0.02ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      2.3±0.03ms        ? ?/sec    1.01      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      2.4±0.03ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      2.4±0.03ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
physical_plan_clickbench_q18                  1.00  1947.4±14.92µs        ? ?/sec    1.00  1955.3±20.57µs        ? ?/sec
physical_plan_clickbench_q19                  1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q2                   1.00  1915.6±17.81µs        ? ?/sec    1.00  1907.8±21.68µs        ? ?/sec
physical_plan_clickbench_q20                  1.00  1664.7±21.83µs        ? ?/sec    1.00  1668.2±17.44µs        ? ?/sec
physical_plan_clickbench_q21                  1.00  1950.3±16.04µs        ? ?/sec    1.01  1960.1±25.19µs        ? ?/sec
physical_plan_clickbench_q22                  1.01      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q23                  1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      4.5±0.02ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q25                  1.01  1989.9±21.81µs        ? ?/sec    1.00  1973.2±22.74µs        ? ?/sec
physical_plan_clickbench_q26                  1.00  1783.9±16.95µs        ? ?/sec    1.00  1786.7±29.07µs        ? ?/sec
physical_plan_clickbench_q27                  1.01      2.0±0.02ms        ? ?/sec    1.00  1990.2±13.93µs        ? ?/sec
physical_plan_clickbench_q28                  1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q3                   1.00  1876.2±19.82µs        ? ?/sec    1.00  1882.1±15.76µs        ? ?/sec
physical_plan_clickbench_q30                  1.01     14.2±0.09ms        ? ?/sec    1.00     14.1±0.10ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q32                  1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q34                  1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      2.1±0.01ms        ? ?/sec    1.01      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q36                  1.00      2.9±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q37                  1.01      2.9±0.04ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      2.9±0.02ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q4                   1.00  1639.3±19.16µs        ? ?/sec    1.00  1641.1±15.76µs        ? ?/sec
physical_plan_clickbench_q40                  1.00      3.3±0.03ms        ? ?/sec    1.01      3.3±0.02ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      2.9±0.02ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q42                  1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q43                  1.00      3.0±0.02ms        ? ?/sec    1.01      3.0±0.01ms        ? ?/sec
physical_plan_clickbench_q44                  1.01  1801.1±14.13µs        ? ?/sec    1.00  1790.9±18.90µs        ? ?/sec
physical_plan_clickbench_q45                  1.00  1801.2±22.34µs        ? ?/sec    1.00  1809.1±24.19µs        ? ?/sec
physical_plan_clickbench_q46                  1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q47                  1.01      2.7±0.02ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      3.3±0.03ms        ? ?/sec    1.00      3.3±0.03ms        ? ?/sec
physical_plan_clickbench_q49                  1.01      3.8±0.04ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
physical_plan_clickbench_q5                   1.00  1833.3±21.95µs        ? ?/sec    1.00  1828.4±21.34µs        ? ?/sec
physical_plan_clickbench_q50                  1.00      3.3±0.03ms        ? ?/sec    1.00      3.3±0.03ms        ? ?/sec
physical_plan_clickbench_q51                  1.00      2.3±0.03ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q52                  1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.04ms        ? ?/sec
physical_plan_clickbench_q6                   1.01  1847.2±24.71µs        ? ?/sec    1.00  1835.4±21.27µs        ? ?/sec
physical_plan_clickbench_q7                   1.00  1694.1±17.88µs        ? ?/sec    1.00  1697.6±17.38µs        ? ?/sec
physical_plan_clickbench_q8                   1.00      2.3±0.02ms        ? ?/sec    1.01      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q9                   1.01      2.2±0.03ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1028.4±3.89ms        ? ?/sec    1.00   1027.8±3.29ms        ? ?/sec
physical_plan_tpch_all                        1.00     61.5±0.25ms        ? ?/sec    1.00     61.2±0.29ms        ? ?/sec
physical_plan_tpch_q1                         1.01      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.7±0.01ms        ? ?/sec    1.00      3.7±0.01ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.2±0.01ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpch_q12                        1.01   1806.8±8.21µs        ? ?/sec    1.00   1791.7±8.36µs        ? ?/sec
physical_plan_tpch_q13                        1.01  1409.1±10.08µs        ? ?/sec    1.00   1391.3±8.55µs        ? ?/sec
physical_plan_tpch_q14                        1.01  1897.9±11.85µs        ? ?/sec    1.00   1885.0±8.58µs        ? ?/sec
physical_plan_tpch_q16                        1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q18                        1.00      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q19                        1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
physical_plan_tpch_q2                         1.00      5.3±0.03ms        ? ?/sec    1.00      5.3±0.02ms        ? ?/sec
physical_plan_tpch_q20                        1.01      3.1±0.01ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
physical_plan_tpch_q21                        1.00      4.1±0.02ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
physical_plan_tpch_q22                        1.01      2.7±0.02ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q3                         1.01      2.5±0.01ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q4                         1.01  1559.4±11.02µs        ? ?/sec    1.00   1537.7±7.81µs        ? ?/sec
physical_plan_tpch_q5                         1.00      3.1±0.01ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.02    871.3±6.43µs        ? ?/sec    1.00    855.9±3.93µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.1±0.02ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q8                         1.01      5.0±0.02ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.0±0.02ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
physical_select_aggregates_from_200           1.01     25.3±0.08ms        ? ?/sec    1.00     25.1±0.07ms        ? ?/sec
physical_select_all_from_1000                 1.00    136.4±0.30ms        ? ?/sec    1.02    139.6±0.28ms        ? ?/sec
physical_select_one_from_700                  1.00   1067.3±5.27µs        ? ?/sec    1.01  1075.9±10.28µs        ? ?/sec
physical_sorted_union_orderby                 1.01     60.7±0.39ms        ? ?/sec    1.00     60.4±0.35ms        ? ?/sec
physical_theta_join_consider_sort             1.00   1732.2±6.12µs        ? ?/sec    1.00   1728.4±7.76µs        ? ?/sec
physical_unnest_to_join                       1.00   1307.0±7.40µs        ? ?/sec    1.00   1303.9±6.63µs        ? ?/sec
with_param_values_many_columns                1.02    145.0±0.99µs        ? ?/sec    1.00    141.9±0.78µs        ? ?/sec

@timsaucer
Copy link
Contributor Author

I'm a bit under the weather at the moment, but it sounds like people are leaning towards

  • merging this to main, cherry picking into rc48 branch
  • adding Andrew's metadata struct in a quick follow up to 49

Is that right? I took a quick look at Andrew's proposal and it looks like a good idea. I just wanted to avoid too much API churn.

@andygrove
Copy link
Member

Is that right? I took a quick look at Andrew's proposal and it looks like a good idea. I just wanted to avoid too much API churn.

That is also my understanding. I will go ahead and merge this PR and create a PR to cherry pick to branch-48.

I hope you feel better soon!

@andygrove andygrove merged commit 0f83c1d into apache:main Jun 6, 2025
29 checks passed
andygrove pushed a commit to andygrove/datafusion that referenced this pull request Jun 6, 2025
* Adding metadata to literal enum type

* Correct processing of metadata during simplificiation

* Switch to btreemap instead of hashmap for carrying around metadata

* Documentation update

* Update unit test

* Additional unit tests needed fields in the debug output

* Updates after rebase

* Add metadata to display when it exists
@alamb
Copy link
Contributor

alamb commented Jun 7, 2025

Thanks @andygrove and @timsaucer -- the plan of merging this PR and then a quick follow on in 49 seems great to me

The planning benchmarks also looked good (no regressions)

Thanks for helping make this happen

@alamb
Copy link
Contributor

alamb commented Jun 7, 2025

I made a PR to main here (no rush on review):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support metadata on literal values
4 participants