V1.4.2 merge and fixes to CTE #211

Dtenwolde · 2025-12-09T12:47:25Z

No description provided.

duckdblabs/duckdb-internal#3220 Handle the edge case of `CREATE TABLE '' AS `... Also for the Relational API

Fixes duckdb/duckdb-fuzzer#4317 Fixes duckdblabs/duckdb-internal#6707

…an" caching into the shell instead of into the base query result

…to figure out whether or not to use the pager

…duckdb#19985) This PR adds support for dark / light mode to the CLI. The mode is automatically detected on start-up by looking at the terminal background color. The old highlight scheme is still available as `.highlight_mode mixed` - and is selected if the background color cannot be auto-detected. The highlight mode can also be manually set using the `.highlight_mode` setting, e.g.: ``` .highlight_mode [light|dark|mixed|auto] ``` ### Dark Background ###### Dark Mode <img width="1638" height="1104" alt="Screenshot 2025-11-29 at 08 30 00" src="https://github.com/user-attachments/assets/4b7874d3-202e-4caf-89fd-9474800abd76" /> ###### Mixed Mode <img width="1658" height="1104" alt="Screenshot 2025-11-29 at 08 31 47" src="https://github.com/user-attachments/assets/1068ef44-d07f-4114-b16b-fe7b7c0dca08" /> ### Light Background ###### Light Mode <img width="1640" height="1043" alt="Screenshot 2025-11-29 at 08 31 00" src="https://github.com/user-attachments/assets/a72f5e55-344b-4a05-a140-ae026d2bcbea" /> ###### Mixed Mode <img width="1645" height="1047" alt="Screenshot 2025-11-29 at 08 31 17" src="https://github.com/user-attachments/assets/c83659e2-20bc-4bf2-acc3-800c3d1295ec" />

…or` (duckdb#19992) Since this function's return type is bool, we should return false instead of -1 when something goes wrong.

…db#19988) This PR fully unifies the result rendering code, and uses the newly unified result rendering to fix up the automatic pager mode. Instead of guessing whether or not a result is wide by the amount of columns, we now actually measure how wide the result set will be and enable the pager based on that. We also do some random clean-up, e.g. removing `QueryResult:: MoreRowsThan` again and moving the iteration / caching to the shell renderers.

* Convert task system to look like AsOf.

This change adds a test to see if extensions can be loaded with their normal name and aliases. It also adds an extension alias lookup in GetDatabaseType, similar to how it is done in CreateAttachedDatabase.

* Fix a rendering bug with auto-complete that would cause it to continuously clear upwards if the buffer did not start at the top of the terminal * Differentiate between `fuzzy suggestions` and `exact suggestions`, where exact suggestions have an exact prefix match with the text, and fuzzy suggestions do not. Then show a smaller number of max fuzzy suggestions. This prevents polluting the auto-complete output with (likely) unrelated files. * Make Ctrl+C cancel auto-complete, instead of clearing the entire buffer

Signed-off-by: Dmitry Nikolaev <[email protected]>

…hout extra modifiers

If we don't set `SetError()`, users can't know why the ingest is failed.

…ckpoint

…kdb#20074) * When shortening wide columns, prefer to shorten the widest columns first before shortening less wide columns, instead of shortening everything by an equal percentage. This means that if we have a table that is e.g. `name, description` - where `name` is slightly above the "wide" threshold (20 bytes) and "description" is significantly above this threshold, we will only shorten the description field - keeping `name` readable. * When stretching out values across rows, give all rows the same amount of space for stretching out, instead of greedily stretching out the first rows * Don't stretch out columns across rows if there are hidden columns in the result set * Always add a separator between rows if we are stretching out columns * For auto-complete, use only the match score instead of the "adjusted sort score" when determining if there are ties or not. This way if we have a directory that has e.g. `my_dir/` and `my_file.parquet` as auto-complete options, `my_` will not instantly auto-complete to `my_dir/` but instead consider this a tie. `my_dir/` will still appear first in the sorted suggestion list, however.

…he vacuum does not change the rowids (duckdb#20073) Currently vacuuming is disabled for indexes entirely because vacuum has the potential of changing row-ids. This prevents deleted rows from being cleaned up, always leaving deletion markers in place. This is necessary because the index relies on row-ids, and if we change the row-id of a row the index no longer points to the correct row. The current behavior with vacuuming is unnecessarily restrictive - however. In certain cases, vacuuming does not change row-ids of existing rows. For example: * We can merge adjacent row groups if they do not have deletes, e.g. if we have two row groups: ``` [RowGroup 1] start: 0 count: 10 [RowGroup 2] start: 10 count: 10 ``` We can merge them into one row group: ``` [RowGroup 1] start: 0 count: 20 ``` This does not change the row ids of any rows, so it is valid also if we have indexes. * We can remove row groups at the tail of a table. For example: ``` [RowGroup 1] start: 0 count: 10 [RowGroup 2] start: 10 count: 10 deleted: 10 ``` If we remove `RowGroup 2`, no existing rows have their row-ids changed, so we can just delete it. This solves the issue that truncating a table with an index on it does not delete any of the actual rows. This doesn't fully solve the vacuuming issue with indexes present but solves it in some common scenarios.

…struct-values

This PR introduces min/max aggregate execution in the statistics propagation analogous to duckdb#15301 and is a follow-up to duckdb#19806. With that, the optimizer can directly execute simple min/max aggregates on top of the row group statistics instead of scanning and aggregating a whole table. **Performance** Data Set: Time-series Benchmark Suite, cpu-only, 100M rows Query: `SELECT max(time) FROM cpu` | | Nightly | This PR | Improvement | | ------- | ------- | -------- | ------------- | | Cold run | 0.10s | 0.016s | **6x** | | Hot run | 0.035s | 0.002s | **18x** |

…sabled Without this change you'll get this very confusing error message when the LocalFileSystem is disabled and you try to query from a table that does not exist: ``` duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;" Permission Error: File system LocalFileSystem has been disabled by configuration ``` This changes that to the more sensible: ``` Binder Error: Catalog "nonexistentdb" does not exist! ```

This PR changes the build system for DuckDB in such a way that any extensions are linked "late" in the build process. This allows caching `libduckdb_static.a` between build processes on the same platform (follow-up PR). We also now package all the required third party dependencies statically. We also refactor extensions builds into a separate CMake file. Finally, we remove the `tpce` extension due to non-use.

If we don't set `SetError()`, users can't know why the ingest is failed.

…sabled (duckdb#20077) Without this change you'll get this very confusing error message when the `LocalFileSystem` is disabled and you try to query from a table that does not exist: ``` duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;" Permission Error: File system LocalFileSystem has been disabled by configuration ``` This changes that to the more sensible: ``` Binder Error: Catalog "nonexistentdb" does not exist! ```

…#19968) This PR adds a new scalar function `struct_values` that complements the existing `struct_keys` function. While `struct_keys` returns the names of a struct's fields, `struct_values` returns the corresponding values as an unnamed struct. ## Example ```sql SELECT struct_values({'a': 1, 'b': 2, 'c': 3}); -- Result: (1, 2, 3) -- Type: ROW(INTEGER, INTEGER, INTEGER) ```

The `TupleDataCollection` was always considered to be a `HASH_TABLE` for legacy reasons. This PR adds a `MemoryTag` to the constructor, so it can properly be used for `MemoryTag::ORDER_BY` like the previous sort. I have also addded `MemoryTag::WINDOW` and used that where appropriate.

@rok

…ritten by Arrow C++ API (duckdb#20046) This PR enables to read _uniformly encrypted parquet files_ written by the Arrow C++ API as addressed in apache/arrow#47254. Fixes duckdb#18524 and duckdblabs/duckdb-internal#5511. Previously, we were not able to read uniformly encrypted parquet files written with Arrow since we did not use the additional authenticated data (aad). The aad consist of a unique prefix and a suffix. The suffix consists of the module type, row group ordinal (present in metadata), column ordinal (column index of the _original_ scheme of a parquet file) and a page ordinal (page index). More details can be found in the [Parquet Encryption Spec](https://parquet.apache.org/docs/file-format/data-pages/encryption/). For testing I've added the uniformly encrypted file from the [parquet-testing](https://github.com/apache/parquet-testing/tree/master/data) repo and used the [gist](https://gist.github.com/rok/8a68066c51801458a3746772a2c736c5) of @rok. On top of this, I've added an Arrow file containing multiple row groups based on the previous gist ([code](https://gist.github.com/ccfelius/3def798ade5de006e6d0073c1c457990)). As also mentioned in apache/arrow#47254, PyArrow uses a KMS which creates randomly generated keys. This means that DuckDB is not able to read files written with PyArrow, since keys are used directly in DuckDB. The fix ensures backwards compatibility with our own encrypted parquet files. In a follow up PR I'll address the parquet writer.

Mytherin and others added 30 commits November 28, 2025 21:38

Highlight mode should not override user-provided colors

6c675e4

Handle empty table name on CREATE (duckdb#19955)

6a676b7

duckdblabs/duckdb-internal#3220 Handle the edge case of `CREATE TABLE '' AS `... Also for the Relational API

Fix ColumnAliasBinder (duckdb#19956)

209f929

Fixes duckdb/duckdb-fuzzer#4317 Fixes duckdblabs/duckdb-internal#6707

Add .highlight_mode [dark|light|mixed] command

878a974

Remove extra semi

adc8292

Move describe renderer to ColumnRenderer

15b6a55

Move more logic to the renderer class

b0bd15a

Move pager logic into ShellRenderer class, and move the "more rows th…

1e13b05

…an" caching into the shell instead of into the base query result

Move printing logic into separate PrintStream class

de24a67

For row renderer - actually render the results and measure the width …

e00f228

…to figure out whether or not to use the pager

[fix] ClientBufferManager::Pin to also propagate context when available

a18a679

Remove pager_min_columns

4f7059d

Make Ctrl+C cancel auto-complete, instead of clearing the entire buffer

124a5c9

Make all result loops go through RenderingQueryResult::TryConvertChunk

6e82ecd

Change return value from -1 to false in TryGetBackgroundColor

9a18509

CLI: Change return value from -1 to false in `TryGetBackgroundCol…

2aa1d08

…or` (duckdb#19992) Since this function's return type is bool, we should return false instead of -1 when something goes wrong.

Fix null rendering

d23a244

Add string_t printing

75d6684

Add render length with str + size

0a6fb45

Switch row data to duckdb::string_t

24201b5

Bulk conversion to string

a03bdc3

More string_t in renderer

0b5461b

Make everything DataChunks

ab2cb5f

Move convert value to column renderer and call it again

2b09b1c

Internal duckdb#6607: IEJoin Task Locking

5c83c03

* Convert task system to look like AsOf.

Adding extension alias lookup in GetDatabaseType (duckdb#19980)

0ebd9a1

This change adds a test to see if extensions can be loaded with their normal name and aliases. It also adds an extension alias lookup in GetDatabaseType, similar to how it is done in CreateAttachedDatabase.

Add z/OS support to block allocator

2e9c378

Signed-off-by: Dmitry Nikolaev <[email protected]>

hannes and others added 29 commits December 5, 2025 15:58

more disk cleanup and checkout action upgrade

aedf694

Only expand columns if we didn't add a split column

5646727

Always add the extra separator between rows if we are extending them

1042640

For the score emitted in auto-complete, emit only the match score wit…

ffb57e8

…hout extra modifiers

Hold block_lock in truncate

c57d4a7

Fixes for large value rendering

b8f7a37

ADBC: Set append data error in Ingest()

72fefe2

If we don't set `SetError()`, users can't know why the ingest is failed.

Merge branch 'main' into vacuumstablerowids

edb462e

concurrentloop: start tasks in random order

10fc077

Avoid vacuuming if there are row groups that don't belong to this che…

efe290a

…ckpoint

removing unused variables

7b17696

Merge branch 'main' of github.com:duckdb/duckdb into feature/etgarsh/…

cac34f3

…struct-values

remove assert

04f7270

merge with main

ecc4e0d

reusing memory stream

061d951

Merge branch 'main' into memory_tag

1135f9b

refactor /w additional feedback

9389451

fix buf size bug

9ef7203

ADBC: Set append data error in Ingest() (duckdb#19991)

3a4f3dd

If we don't set `SetError()`, users can't know why the ingest is failed.

Merge with v1.4.2, fix cte issues

fe54d95

Dtenwolde changed the base branch from main to v1.4-andium December 9, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V1.4.2 merge and fixes to CTE #211

V1.4.2 merge and fixes to CTE #211

Uh oh!

Dtenwolde commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

V1.4.2 merge and fixes to CTE #211

Are you sure you want to change the base?

V1.4.2 merge and fixes to CTE #211

Uh oh!

Conversation

Dtenwolde commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants