-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add comments to ClickBench queries about setting binary_as_string #16605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds informative comments at the top of each ClickBench query SQL file to remind users to set the configuration datafusion.execution.parquet.binary_as_string for the hits_partitioned dataset. The goal is to help avoid inconsistent performance measurements when running the benchmarks locally.
- Added comments explaining the need for setting binary_as_string in every ClickBench query.
- Linked the comment to the relevant GitHub issue for additional context.
Reviewed Changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
benchmarks/queries/clickbench/queries/q9.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q8.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q7.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q6.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q5.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q42.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q41.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q40.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q4.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q39.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q38.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q37.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q36.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q35.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q34.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q33.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q32.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q31.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q30.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q3.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q29.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q28.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q27.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q26.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q25.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q24.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q23.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q22.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q21.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q20.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q2.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q19.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q18.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q17.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q16.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q15.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q14.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q13.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q12.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q11.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q10.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q1.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
benchmarks/queries/clickbench/queries/q0.sql | Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is a clear PR, thank you @alamb !
Which issue does this PR close?
Rationale for this change
I forgot to set an important setting when running the clickbench queries locally with
datafusion-cli
resulting in strange performance results.We can't just change the default value of this setting as it isn't correct in general (see #16599 (comment)) it is pretty specific to the ClickBench datafiles
@zhuqi-lucas proposed changing the default in
datafusion-cli
but I fear that may also cause confusion -- see #16599 for more detailsWhat changes are included in this PR?
Add comments to the ClickBench queries so future readers will not be (as) likely to make the same mistake I did
Are these changes tested?
I tested locally to make sure the comments are ignored by the benchmark runner and they are
And
To verify that the queries still run correctly with the comments
Are there any user-facing changes?