forked from duckdb/duckdb
-
Notifications
You must be signed in to change notification settings - Fork 3
V1.4.2 merge and fixes to CTE #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Dtenwolde
wants to merge
3,818
commits into
v1.4-andium
Choose a base branch
from
v1.4-rework-cte
base: v1.4-andium
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
duckdblabs/duckdb-internal#3220 Handle the edge case of `CREATE TABLE '' AS `... Also for the Relational API
…an" caching into the shell instead of into the base query result
…to figure out whether or not to use the pager
…duckdb#19985) This PR adds support for dark / light mode to the CLI. The mode is automatically detected on start-up by looking at the terminal background color. The old highlight scheme is still available as `.highlight_mode mixed` - and is selected if the background color cannot be auto-detected. The highlight mode can also be manually set using the `.highlight_mode` setting, e.g.: ``` .highlight_mode [light|dark|mixed|auto] ``` ### Dark Background ###### Dark Mode <img width="1638" height="1104" alt="Screenshot 2025-11-29 at 08 30 00" src="https://github.com/user-attachments/assets/4b7874d3-202e-4caf-89fd-9474800abd76" /> ###### Mixed Mode <img width="1658" height="1104" alt="Screenshot 2025-11-29 at 08 31 47" src="https://github.com/user-attachments/assets/1068ef44-d07f-4114-b16b-fe7b7c0dca08" /> ### Light Background ###### Light Mode <img width="1640" height="1043" alt="Screenshot 2025-11-29 at 08 31 00" src="https://github.com/user-attachments/assets/a72f5e55-344b-4a05-a140-ae026d2bcbea" /> ###### Mixed Mode <img width="1645" height="1047" alt="Screenshot 2025-11-29 at 08 31 17" src="https://github.com/user-attachments/assets/c83659e2-20bc-4bf2-acc3-800c3d1295ec" />
…or` (duckdb#19992) Since this function's return type is bool, we should return false instead of -1 when something goes wrong.
…db#19988) This PR fully unifies the result rendering code, and uses the newly unified result rendering to fix up the automatic pager mode. Instead of guessing whether or not a result is wide by the amount of columns, we now actually measure how wide the result set will be and enable the pager based on that. We also do some random clean-up, e.g. removing `QueryResult:: MoreRowsThan` again and moving the iteration / caching to the shell renderers.
* Convert task system to look like AsOf.
This change adds a test to see if extensions can be loaded with their normal name and aliases. It also adds an extension alias lookup in GetDatabaseType, similar to how it is done in CreateAttachedDatabase.
* Fix a rendering bug with auto-complete that would cause it to continuously clear upwards if the buffer did not start at the top of the terminal * Differentiate between `fuzzy suggestions` and `exact suggestions`, where exact suggestions have an exact prefix match with the text, and fuzzy suggestions do not. Then show a smaller number of max fuzzy suggestions. This prevents polluting the auto-complete output with (likely) unrelated files. * Make Ctrl+C cancel auto-complete, instead of clearing the entire buffer
Signed-off-by: Dmitry Nikolaev <[email protected]>
…hout extra modifiers
If we don't set `SetError()`, users can't know why the ingest is failed.
…kdb#20074) * When shortening wide columns, prefer to shorten the widest columns first before shortening less wide columns, instead of shortening everything by an equal percentage. This means that if we have a table that is e.g. `name, description` - where `name` is slightly above the "wide" threshold (20 bytes) and "description" is significantly above this threshold, we will only shorten the description field - keeping `name` readable. * When stretching out values across rows, give all rows the same amount of space for stretching out, instead of greedily stretching out the first rows * Don't stretch out columns across rows if there are hidden columns in the result set * Always add a separator between rows if we are stretching out columns * For auto-complete, use only the match score instead of the "adjusted sort score" when determining if there are ties or not. This way if we have a directory that has e.g. `my_dir/` and `my_file.parquet` as auto-complete options, `my_` will not instantly auto-complete to `my_dir/` but instead consider this a tie. `my_dir/` will still appear first in the sorted suggestion list, however.
…he vacuum does not change the rowids (duckdb#20073) Currently vacuuming is disabled for indexes entirely because vacuum has the potential of changing row-ids. This prevents deleted rows from being cleaned up, always leaving deletion markers in place. This is necessary because the index relies on row-ids, and if we change the row-id of a row the index no longer points to the correct row. The current behavior with vacuuming is unnecessarily restrictive - however. In certain cases, vacuuming does not change row-ids of existing rows. For example: * We can merge adjacent row groups if they do not have deletes, e.g. if we have two row groups: ``` [RowGroup 1] start: 0 count: 10 [RowGroup 2] start: 10 count: 10 ``` We can merge them into one row group: ``` [RowGroup 1] start: 0 count: 20 ``` This does not change the row ids of any rows, so it is valid also if we have indexes. * We can remove row groups at the tail of a table. For example: ``` [RowGroup 1] start: 0 count: 10 [RowGroup 2] start: 10 count: 10 deleted: 10 ``` If we remove `RowGroup 2`, no existing rows have their row-ids changed, so we can just delete it. This solves the issue that truncating a table with an index on it does not delete any of the actual rows. This doesn't fully solve the vacuuming issue with indexes present but solves it in some common scenarios.
This PR introduces min/max aggregate execution in the statistics propagation analogous to duckdb#15301 and is a follow-up to duckdb#19806. With that, the optimizer can directly execute simple min/max aggregates on top of the row group statistics instead of scanning and aggregating a whole table. **Performance** Data Set: Time-series Benchmark Suite, cpu-only, 100M rows Query: `SELECT max(time) FROM cpu` | | Nightly | This PR | Improvement | | ------- | ------- | -------- | ------------- | | Cold run | 0.10s | 0.016s | **6x** | | Hot run | 0.035s | 0.002s | **18x** |
…sabled Without this change you'll get this very confusing error message when the LocalFileSystem is disabled and you try to query from a table that does not exist: ``` duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;" Permission Error: File system LocalFileSystem has been disabled by configuration ``` This changes that to the more sensible: ``` Binder Error: Catalog "nonexistentdb" does not exist! ```
This PR changes the build system for DuckDB in such a way that any extensions are linked "late" in the build process. This allows caching `libduckdb_static.a` between build processes on the same platform (follow-up PR). We also now package all the required third party dependencies statically. We also refactor extensions builds into a separate CMake file. Finally, we remove the `tpce` extension due to non-use.
If we don't set `SetError()`, users can't know why the ingest is failed.
…sabled (duckdb#20077) Without this change you'll get this very confusing error message when the `LocalFileSystem` is disabled and you try to query from a table that does not exist: ``` duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;" Permission Error: File system LocalFileSystem has been disabled by configuration ``` This changes that to the more sensible: ``` Binder Error: Catalog "nonexistentdb" does not exist! ```
…#19968) This PR adds a new scalar function `struct_values` that complements the existing `struct_keys` function. While `struct_keys` returns the names of a struct's fields, `struct_values` returns the corresponding values as an unnamed struct. ## Example ```sql SELECT struct_values({'a': 1, 'b': 2, 'c': 3}); -- Result: (1, 2, 3) -- Type: ROW(INTEGER, INTEGER, INTEGER) ```
The `TupleDataCollection` was always considered to be a `HASH_TABLE` for legacy reasons. This PR adds a `MemoryTag` to the constructor, so it can properly be used for `MemoryTag::ORDER_BY` like the previous sort. I have also addded `MemoryTag::WINDOW` and used that where appropriate.
…ritten by Arrow C++ API (duckdb#20046) This PR enables to read _uniformly encrypted parquet files_ written by the Arrow C++ API as addressed in apache/arrow#47254. Fixes duckdb#18524 and duckdblabs/duckdb-internal#5511. Previously, we were not able to read uniformly encrypted parquet files written with Arrow since we did not use the additional authenticated data (aad). The aad consist of a unique prefix and a suffix. The suffix consists of the module type, row group ordinal (present in metadata), column ordinal (column index of the _original_ scheme of a parquet file) and a page ordinal (page index). More details can be found in the [Parquet Encryption Spec](https://parquet.apache.org/docs/file-format/data-pages/encryption/). For testing I've added the uniformly encrypted file from the [parquet-testing](https://github.com/apache/parquet-testing/tree/master/data) repo and used the [gist](https://gist.github.com/rok/8a68066c51801458a3746772a2c736c5) of @rok. On top of this, I've added an Arrow file containing multiple row groups based on the previous gist ([code](https://gist.github.com/ccfelius/3def798ade5de006e6d0073c1c457990)). As also mentioned in apache/arrow#47254, PyArrow uses a KMS which creates randomly generated keys. This means that DuckDB is not able to read files written with PyArrow, since keys are used directly in DuckDB. The fix ensures backwards compatibility with our own encrypted parquet files. In a follow up PR I'll address the parquet writer.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.