Skip to content

Conversation

@Pavan-Nambi
Copy link
Contributor

@Pavan-Nambi Pavan-Nambi commented Oct 24, 2025

Turso incorrectly creates the first table in an autovacuumed table in page 2.

(Note: this is on collaboration with @LeMikaelF)

SQLite does not allow enabling or disabling auto-vacuum after the first table has been created (https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the sequence of the pages in the databases is different when auto-vacuum is enabled, because the first b-tree page must be page 3 instead of 2, to make room for the first Pointer Map page. But Turso doesn't currently consider this, which can lead to data loss.

The simplest way to reproduce this is to create an autovacuumed databases with either pragma auto_vacuum=full so that autovacuum runs on each commit, and then create a table with some data. Turso will incorrectly create the new table on page 2. After this, every time a new page is created, either through a page split or because a new table is created, Turso will write a 5-byte pointer in page 2, starting from the top of the page, thereby overwriting existing data.

For example, let's start with a clean database and the first bytes of page 2. It starts with 0d, the discriminator for a leaf page (source). The next interesting number is the number of cells contained in this page (01) at offset 5.

$ cargo run -- /tmp/a.db
turso> create table t(a);
turso> insert into t values ('myvalue');

$ dbtotxt /tmp/a.db
| size 8192 pagesize 4096 filename a.db
| page 1 offset 0
# ...snip...
| page 2 offset 4096
|      0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00   ................
|   4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65   .........myvalue
| end a.db

Pointer map pages are located every N pages, starting from page 2, and contain a list of 5-byte pointers that represent the parent page of a certain page. So whenever Turso or SQLite needs to add a page, it will overwrite 5 bytes of page 2. This means that for data loss to occur, it is sufficient to add a single page to the database, for example by creating a table. Offset 5 will then be zeroed out:

$ cargo run -- /tmp/a.db
turso> create table t(a);
turso> insert into t values ('myvalue');
turso> pragma auto_vacuum=full;
turso> create table tt(a);

$ dbtotxt /tmp/a.db
| size 12288 pagesize 4096 filename a.db
| page 1 offset 0
# ...snip...
| page 2 offset 4096
|      0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00   ................
|   4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65   .........myvalue

Creating more tables, or adding more B-tree pages, will keep overwriting the rest of the page, until the cells themselves are also overwritten.

Reproducing the issue in the simulator

We have been unable to reproduce this exact corruption mode in the simulator, but patching it shows many failure modes, all of which don't occur with the unpatched simulator. The following seeds are failing. The following seeds are showing the issue when the patched simulator is ran against main:

  • 11522841279124073062, with "Assertion 'table inquisitive_graham_159 should contain all of its expected values' failed: table inquisitive_graham_159 does not contain the expected values, the simulator model has more rows than the database"
  • 7057400018220918989, 16028085350691325843, 7721542713659053944, and 203017821863546118, with "Failed to read ptrmap key=XXX"
  • 12533694709304969540, 18357088553315413457, 3108945730906932377, with "Integrity Check Failed: Cell N in page 2 is out of range."
    • 4757352625344646473, with "dirty pages should be empty for read txn"
  • 7083498604824302257, with "header_size: 6272, header_len_bytes: 2, payload.len(): 13"
  • 17881876827470741581, with "ParseError("no such table: focused_historians_416")"
  • 2092231500503735693, with "range end index 4789 out of range for slice of length 4096"
  • 7555257419378470845, with malformed database schema (imaginative_ontivero\u{1})"
  • 12905270229511147245, with "index out of bounds: the len is 4096 but the index is 4096"

Fixing the issue

  • When DB is opened, we read the auto_vacuum state, instead of assuming auto_vacuum=none.
  • Don't allow auto_vacuum to be flipped on non-empty databases as if we allow this it could cause overlap with existing bits.(ptrmap could overwrite existing data)
  • Modify integrity check to avoid reporting that page 2 is orphaned in auto-vacuumed databases.

Fixes #3752

Co-authored-by: Mikaël Francoeur <[email protected]>

cleanup

Co-authored-by: Mikaël Francoeur <[email protected]>
@Pavan-Nambi Pavan-Nambi marked this pull request as ready for review October 24, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should make autovacuum experimental mode as it's not tested and not fully implemented yet

2 participants