Skip to content

duckdb-wasm freezes on runQuery when writing parquet file #2010

Open
@gsemet

Description

@gsemet

What happens?

Hello. On my corporate environment, duckdb-wasm freezes on the first call to query/runQuery to write a parquet file.
I have opened a ticket evidence-dev/evidence#3155 to describe me problem, but I think I narrowed it down to a specific call.

In short, my process blocks when duckdb-wasm do this query

COPY (SELECT * FROM read_parquet(['/path/to/my/project/.evidence/meta/buildinfo_csv/code_coverage/tmp/code_coverage.0.parquet'])) TO '/path/to/my/project/.evidence/template/static/data/buildinfo_csv/code_coverage/code_coverage.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD', USE_TMP_FILE false);

On my personal Mac, it does NOT freeze, everything works (even with the proxu). But in a Mac in our corporate environent (behind a proxy), on a Windows WSL2 using an ubuntu 22, on a VM with a Ubuntu 22, on docker image using the official node 22 (alpine i guess), it freezes.
I tried many versions of duckdb-wasm, node (20, 22, 23), always the same result.

I cannot find what is the reason why a call to query would fails. I tried to debug with strace, it seems locked in a mutex (dead lock?)

It is possible that the proxy being slightly different might have an impact? On my mac (so in the env that does NOT freeze), the proxy are with the format http://localhost:3128 (there is a local reverse proxy agent). On all the other environment that freeze, the format is http://theusername:[email protected]:3128

To Reproduce

On ubuntu 22 VM or docker image running inside our network infrastructure (so behind a corporate proxy).

# our internal certificates
NODE_EXTRA_CA_CERTS: /etc/ssl/certs/ca-certificates.crt
HTTPS_PROXY/HTTP_PROXY/NO_PROXY/https_proxy/http_proxy/no_proxy set to our internal proxy http url

node --version
npm --version

npx degit evidence-dev/template my-project
cd my-project
npm run sources

This latest commands never stops.

Here is the backtrace (using chrome inspect) to where it locks:

Image

The query (r variable) is:

COPY (SELECT * FROM read_parquet(['/home/jupyter-gsemet/Projects/test-evidence/my-project/.evidence/meta/needful_things/orders/tmp/orders.0.parquet'])) TO '/home/jupyter-gsemet/Projects/test-evidence/my-project/.evidence/template/static/data/needful_things/orders/orders.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD', USE_TMP_FILE false);

Evidence seems to have first call to convert csv or its own query on a duckdb example database, into a parquet file first, and then call duckdb-wasm to construct a single parquet file.

The exact line in Evidence source code where it freezes is https://github.com/evidence-dev/evidence/blob/f461d6d63a09d9ee3bb149c2d9e1721c70bbc9ac/packages/lib/universal-sql/src/build-parquet.js#L191

Browser/Environment:

Not relevant (node process)

Device:

linux or mac

DuckDB-Wasm Version:

1.29.0

DuckDB-Wasm Deployment:

See evidence source code

Full Name:

Gaetan Semet

Affiliation:

Ampere Technologies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions