Skip to content

Creating an HNSW index causes severe data inflation #32

@QuanZi123

Description

@QuanZi123

duckdb version : 1.1.1

When create HNSW indexes first and then insert data, duckdb's data files are 100 times larger than the original files

But if insert the data first and then create the HNSW index, duckdb's data swells by about 15 times

Test data and code duckdb_index

Here's my test result:

Original file train.jsonl : 78371kb

File speed_3.db (create index first) : 7055628KB

File speed_t.db (insert data first) : 1328369KB

d1
d2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions