Skip to content

Add comments to ClickBench queries about setting binary_as_string #16605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 28, 2025

Which issue does this PR close?

Rationale for this change

I forgot to set an important setting when running the clickbench queries locally with datafusion-cli resulting in strange performance results.

We can't just change the default value of this setting as it isn't correct in general (see #16599 (comment)) it is pretty specific to the ClickBench datafiles

@zhuqi-lucas proposed changing the default in datafusion-cli but I fear that may also cause confusion -- see #16599 for more details

What changes are included in this PR?

Add comments to the ClickBench queries so future readers will not be (as) likely to make the same mistake I did

Are these changes tested?

I tested locally to make sure the comments are ignored by the benchmark runner and they are

./benchmarks/bench.sh run clickbench_partitioned 41

And

./benchmarks/bench.sh run clickbench_partitioned

To verify that the queries still run correctly with the comments

Are there any user-facing changes?

@alamb alamb marked this pull request as ready for review June 28, 2025 14:58
@alamb alamb requested a review from Copilot June 28, 2025 15:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds informative comments at the top of each ClickBench query SQL file to remind users to set the configuration datafusion.execution.parquet.binary_as_string for the hits_partitioned dataset. The goal is to help avoid inconsistent performance measurements when running the benchmarks locally.

  • Added comments explaining the need for setting binary_as_string in every ClickBench query.
  • Linked the comment to the relevant GitHub issue for additional context.

Reviewed Changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated no comments.

Show a summary per file
File Description
benchmarks/queries/clickbench/queries/q9.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q8.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q7.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q6.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q5.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q42.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q41.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q40.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q4.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q39.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q38.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q37.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q36.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q35.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q34.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q33.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q32.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q31.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q30.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q3.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q29.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q28.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q27.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q26.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q25.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q24.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q23.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q22.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q21.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q20.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q2.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q19.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q18.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q17.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q16.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q15.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q14.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q13.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q12.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q11.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q10.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q1.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage
benchmarks/queries/clickbench/queries/q0.sql Added comment to advise setting datafusion.execution.parquet.binary_as_string for ClickBench usage

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is a clear PR, thank you @alamb !

@alamb alamb added the documentation Improvements or additions to documentation label Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of ClickBench Q21 by removing the cast
2 participants