Skip to content

Conversation

@milahu
Copy link

@milahu milahu commented Jul 31, 2025

learn about SQLite file format then generate the pages and write to disk directly.

now that kaitai has serialization
we can use it to write sqlite pages

this is an early draft...
it only works to write the database header, no pages
further progress requires serializing pages
and writing them to the correct byte offsets in the database file

use case: convert a 5 GB jsonl file to sqlite
torrents_byteoffsets_parse_jsonl_zst.py
with my script i reach only 5 MiB/s
the bottleneck are the sqlite insert queries
my system (under load) can read and decompress at 20 MiB/s

@milahu milahu force-pushed the add-serialize_py branch from e750072 to 94e2fdc Compare August 1, 2025 05:05
@milahu milahu force-pushed the add-serialize_py branch from 6b6e97c to 289e8c3 Compare August 3, 2025 21:18
@milahu milahu force-pushed the add-serialize_py branch from 98caf29 to b6046a9 Compare August 4, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant