-
Notifications
You must be signed in to change notification settings - Fork 160
Description
We are generating queries to read very wide parquet files as part of a Census data extraction service. Through experimentation I've determined the limit of named columns in a select clause is 16256. When this is exceeded we get
cli_extractor: /home/ccd/nhgis-extract-engine/rust/target/release/build/libduckdb-sys-14f8e1593a01c721/out/duckdb/src/common/types/row/row_data_collection.cpp:82: duckdb::v
ector<duckdb::BufferHandle> duckdb::RowDataCollection::Build(duckdb::idx_t, duckdb::data_t**, duckdb::idx_t*, const duckdb::SelectionVector*): Assertion `new_block.count >
0' failed.
We realize this is a very large number of columns, and 99.5% of our workload is well under 16256 and runs very well. It would be nice if we could get a good error message, or have a way to increase the maximum width of a result with a setting.
It looks like there's a problem with the memory allocation code, and the actual physical memory of our machines aren't anywhere near used up when I monitor the process. When the column number is 14000 the memory use is pretty low for example.
I can reproduce the issue on version 1.1.1 (the one bundled with the Rust package.) With the CLI I can get a similar problem on v1.1.1 and v1.1.3. With the CLI it tries to dump to a temp file and then just gets stuck. The size of the temp file in .tmp is just 256kb.
When I test on the nightly build downloaded today I get a different error:
ccd@gp2000:~/nhgis-extract-engine/rust$ ~/duckdb/duckdb < test_query.sql
Floating point exception (core dumped)
ccd@gp2000:~/nhgis-extract-engine/rust$
Due to the nature of the problem the query and data file to reproduce the issue are both extremely large but I'm willing to share.