Skip to content

Conversation

@Dtenwolde
Copy link
Collaborator

No description provided.

Mytherin and others added 30 commits November 28, 2025 21:38
duckdblabs/duckdb-internal#3220
Handle the edge case of `CREATE TABLE '' AS `...
Also for the Relational API
…an" caching into the shell instead of into the base query result
…to figure out whether or not to use the pager
…duckdb#19985)

This PR adds support for dark / light mode to the CLI. The mode is
automatically detected on start-up by looking at the terminal background
color. The old highlight scheme is still available as `.highlight_mode
mixed` - and is selected if the background color cannot be
auto-detected. The highlight mode can also be manually set using the
`.highlight_mode` setting, e.g.:

```
.highlight_mode [light|dark|mixed|auto]
```

### Dark Background

###### Dark Mode
<img width="1638" height="1104" alt="Screenshot 2025-11-29 at 08 30 00"
src="https://github.com/user-attachments/assets/4b7874d3-202e-4caf-89fd-9474800abd76"
/>

###### Mixed Mode
<img width="1658" height="1104" alt="Screenshot 2025-11-29 at 08 31 47"
src="https://github.com/user-attachments/assets/1068ef44-d07f-4114-b16b-fe7b7c0dca08"
/>

### Light Background

###### Light Mode
<img width="1640" height="1043" alt="Screenshot 2025-11-29 at 08 31 00"
src="https://github.com/user-attachments/assets/a72f5e55-344b-4a05-a140-ae026d2bcbea"
/>

###### Mixed Mode
<img width="1645" height="1047" alt="Screenshot 2025-11-29 at 08 31 17"
src="https://github.com/user-attachments/assets/c83659e2-20bc-4bf2-acc3-800c3d1295ec"
/>
…or` (duckdb#19992)

Since this function's return type is bool, we should return false
instead of -1 when something goes wrong.
…db#19988)

This PR fully unifies the result rendering code, and uses the newly
unified result rendering to fix up the automatic pager mode. Instead of
guessing whether or not a result is wide by the amount of columns, we
now actually measure how wide the result set will be and enable the
pager based on that.

We also do some random clean-up, e.g. removing `QueryResult::
MoreRowsThan` again and moving the iteration / caching to the shell
renderers.
* Convert task system to look like AsOf.
This change adds a test to see if extensions can be loaded
with their normal name and aliases.

It also adds an extension alias lookup in GetDatabaseType,
similar to how it is done in CreateAttachedDatabase.
* Fix a rendering bug with auto-complete that would cause it to
continuously clear upwards if the buffer did not start at the top of the
terminal
* Differentiate between `fuzzy suggestions` and `exact suggestions`,
where exact suggestions have an exact prefix match with the text, and
fuzzy suggestions do not. Then show a smaller number of max fuzzy
suggestions. This prevents polluting the auto-complete output with
(likely) unrelated files.
* Make Ctrl+C cancel auto-complete, instead of clearing the entire
buffer
hannes and others added 29 commits December 5, 2025 15:58
If we don't set `SetError()`, users can't know why the ingest is
failed.
…kdb#20074)

* When shortening wide columns, prefer to shorten the widest columns
first before shortening less wide columns, instead of shortening
everything by an equal percentage. This means that if we have a table
that is e.g. `name, description` - where `name` is slightly above the
"wide" threshold (20 bytes) and "description" is significantly above
this threshold, we will only shorten the description field - keeping
`name` readable.
* When stretching out values across rows, give all rows the same amount
of space for stretching out, instead of greedily stretching out the
first rows
* Don't stretch out columns across rows if there are hidden columns in
the result set
* Always add a separator between rows if we are stretching out columns
* For auto-complete, use only the match score instead of the "adjusted
sort score" when determining if there are ties or not. This way if we
have a directory that has e.g. `my_dir/` and `my_file.parquet` as
auto-complete options, `my_` will not instantly auto-complete to
`my_dir/` but instead consider this a tie. `my_dir/` will still appear
first in the sorted suggestion list, however.
…he vacuum does not change the rowids (duckdb#20073)

Currently vacuuming is disabled for indexes entirely because vacuum has
the potential of changing row-ids. This prevents deleted rows from being
cleaned up, always leaving deletion markers in place. This is necessary
because the index relies on row-ids, and if we change the row-id of a
row the index no longer points to the correct row.

The current behavior with vacuuming is unnecessarily restrictive -
however. In certain cases, vacuuming does not change row-ids of existing
rows. For example:

* We can merge adjacent row groups if they do not have deletes, e.g. if
we have two row groups:
```
[RowGroup 1]
    start: 0
    count: 10
[RowGroup 2]
    start: 10
    count: 10
```

We can merge them into one row group:

```
[RowGroup 1]
    start: 0
    count: 20
```

This does not change the row ids of any rows, so it is valid also if we
have indexes.

* We can remove row groups at the tail of a table. For example:

```
[RowGroup 1]
    start: 0
    count: 10
[RowGroup 2]
    start: 10
    count: 10
    deleted: 10
```

If we remove `RowGroup 2`, no existing rows have their row-ids changed,
so we can just delete it. This solves the issue that truncating a table
with an index on it does not delete any of the actual rows.


This doesn't fully solve the vacuuming issue with indexes present but
solves it in some common scenarios.
This PR introduces min/max aggregate execution in the statistics
propagation analogous to duckdb#15301 and is a follow-up to duckdb#19806. With that,
the optimizer can directly execute simple min/max aggregates on top of
the row group statistics instead of scanning and aggregating a whole
table.

**Performance**
Data Set: Time-series Benchmark Suite, cpu-only, 100M rows
Query: `SELECT max(time) FROM cpu`

|               | Nightly | This PR | Improvement |
| ------- | ------- | -------- | ------------- |
| Cold run | 0.10s  |  0.016s | **6x** |
| Hot run | 0.035s | 0.002s | **18x** |
…sabled

Without this change you'll get this very confusing error message when
the LocalFileSystem is disabled and you try to query from a table that
does not exist:

```
duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;"
Permission Error:
File system LocalFileSystem has been disabled by configuration
```

This changes that to the more sensible:
```
Binder Error:
Catalog "nonexistentdb" does not exist!
```
This PR changes the build system for DuckDB in such a way that any
extensions are linked "late" in the build process. This allows caching
`libduckdb_static.a` between build processes on the same platform
(follow-up PR). We also now package all the required third party
dependencies statically. We also refactor extensions builds into a
separate CMake file. Finally, we remove the `tpce` extension due to
non-use.
If we don't set `SetError()`, users can't know why the ingest is failed.
…sabled (duckdb#20077)

Without this change you'll get this very confusing error message when
the `LocalFileSystem` is disabled and you try to query from a table that
does not exist:

```
duckdb -c "set disabled_filesystems='LocalFileSystem'; select * from nonexistentdb.main.t1;"
Permission Error:
File system LocalFileSystem has been disabled by configuration
```

This changes that to the more sensible:
```
Binder Error:
Catalog "nonexistentdb" does not exist!
```
…#19968)

This PR adds a new scalar function `struct_values` that complements the
existing `struct_keys` function. While `struct_keys` returns the names
of a struct's fields, `struct_values` returns the corresponding values
as an unnamed struct.

## Example
```sql
SELECT struct_values({'a': 1, 'b': 2, 'c': 3});
-- Result: (1, 2, 3) 
-- Type: ROW(INTEGER, INTEGER, INTEGER)
```
The `TupleDataCollection` was always considered to be a `HASH_TABLE` for
legacy reasons. This PR adds a `MemoryTag` to the constructor, so it can
properly be used for `MemoryTag::ORDER_BY` like the previous sort. I
have also addded `MemoryTag::WINDOW` and used that where appropriate.
…ritten by Arrow C++ API (duckdb#20046)

This PR enables to read _uniformly encrypted parquet files_ written by
the Arrow C++ API as addressed in
apache/arrow#47254. Fixes
duckdb#18524 and
duckdblabs/duckdb-internal#5511.

Previously, we were not able to read uniformly encrypted parquet files
written with Arrow since we did not use the additional authenticated
data (aad). The aad consist of a unique prefix and a suffix. The suffix
consists of the module type, row group ordinal (present in metadata),
column ordinal (column index of the _original_ scheme of a parquet file)
and a page ordinal (page index). More details can be found in the
[Parquet Encryption
Spec](https://parquet.apache.org/docs/file-format/data-pages/encryption/).

For testing I've added the uniformly encrypted file from the
[parquet-testing](https://github.com/apache/parquet-testing/tree/master/data)
repo and used the
[gist](https://gist.github.com/rok/8a68066c51801458a3746772a2c736c5) of
@rok. On top of this, I've added an Arrow file containing multiple row
groups based on the previous gist
([code](https://gist.github.com/ccfelius/3def798ade5de006e6d0073c1c457990)).

As also mentioned in apache/arrow#47254,
PyArrow uses a KMS which creates randomly generated keys. This means
that DuckDB is not able to read files written with PyArrow, since keys
are used directly in DuckDB.

The fix ensures backwards compatibility with our own encrypted parquet
files. In a follow up PR I'll address the parquet writer.
@Dtenwolde Dtenwolde changed the base branch from main to v1.4-andium December 9, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.