-
Notifications
You must be signed in to change notification settings - Fork 2.9k
OpenAPI: Add min-rows-requested field to PlanTableScanRequest #14565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4402,6 +4402,11 @@ components: | |
| description: | ||
| Expression used to filter the table data | ||
| $ref: '#/components/schemas/Expression' | ||
| min-rows-requested: | ||
| description: | ||
| The minimum number of rows requested for the scan | ||
|
||
| type: integer | ||
| format: int64 | ||
| case-sensitive: | ||
| description: Enables case sensitive field matching for filter and select | ||
| type: boolean | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name and description a bit confusing. For me, it is obvious that it might not return exactly the number of rows that are requested. For example, the table might be empty. I would much rather go with something that's analogue to SQL:
Of course, it might still return more rows since the Parquet file contains more rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I look at things the other way since I feel like
limitconnotes a strict upper-bound (that is if there are sufficient matching rows) as how it means in SQL. Another idea:limit-hint?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Fokko , what about
soft_limit?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to jump in on bike shedding how about
max_rows_requiredI think it lacks the ambiguity if someone isn't familiar with SQL and I think my issue with and hopefully makes it clearer that this is in fact an upper bound?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to pile on.
target-scan-task-numberUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually data row count or scan task count? I thought it is later. but if it is former, this config can be
target-row-count.This is similar to the table properties for
write.target-file-size-bytesorread.split.target-size.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are talking about fine grained control, maybe
target-plan-size-bytesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @nastra confirm but I think it is a row count (as the original name suggests).
target-row-countseems more ambiguous to me in terms of semantics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for rows and we're looking for a lower bound, hence why we're having the
minin the name. We're open to better naming suggestions, but using something withlimitin the name rather implies an upper bound, which wouldn't be correct hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra lower bound may not always be valid. e.g. maybe the table just don't have enough number of rows to satisfy the lower bound.
targetis probably a good name for the desired plan size. Wondering why not usingbytes(instead ofrows) to express the target, similar toread.split.target-size?Anyway, I would suggest
target-plan-size-rowsortarget-plan-size-bytes.