Skip to content
Discussion options

You must be logged in to vote

@tingjun-cs In your example, the reason clickhouse+pandas is much faster is that you are using the native clickhouse driver.

    from clickhouse_driver import Client
    client = Client(**CLICKHOUSE_CONFIG)
    df = client.query_dataframe(CLICKHOUSE_QUERY)

This is optimized for getting data from clickhouse into pandas specifically. More specifically, both clickhouse & pandas are column major, so there is much less overhead when converting. the official clickhouse_connect driver is arrow compatible, and I believe itll use that when converting to dataframes. I'm not sure what happens internally with the unofficial driver you provided in the code.

The daft.read_sql is general purpose and is …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@jaychia
Comment options

Answer selected by jaychia
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants