Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writes don't reuse freed pages from the previous transaction #829

Open
mconst opened this issue Jul 24, 2024 · 4 comments
Open

Writes don't reuse freed pages from the previous transaction #829

mconst opened this issue Jul 24, 2024 · 4 comments

Comments

@mconst
Copy link
Contributor

mconst commented Jul 24, 2024

Thanks for writing redb!

A sequence of large write transactions seems to consume twice as much disk space as I was expecting. If each transaction modifies 1 GB of existing pages (and there are no live reads), I was expecting 1 GB of overhead: the old data needs to be kept around until the transaction has committed.

But in fact this has 2 GB of overhead. The first transaction allocates 1 GB of new pages, and adds 1 GB of old pages to the freed tree, as expected. The second transaction correctly frees those old pages -- but they're freed at the end of the transaction, rather than the beginning. So even though the pages are no longer referenced by anything and could be reused, the second transaction doesn't use them; instead, it allocates a fresh 1 GB of pages, doubling the disk usage. The original pages don't get reused until the third transaction[*].

It's possible to work around this, at the cost of some extra latency, by adding an extra empty write transaction after each real write. But it's kind of surprising that that's necessary, and it would be nice if it weren't needed!

This also interacts with another issue, where live reads are blocking page freeing for one transaction longer than they should. I'll submit that one separately.

[*] A few pages do actually get reused in the second transaction, to store the updated freed tree. But they don't get reused for data.

@cberner
Copy link
Owner

cberner commented Jul 26, 2024

Ah yes, I need to think through whether it's possible to do this free'ing at the end of the commit, after the last fsync()

I think I implemented it the way it is for a couple reasons:

  1. I wanted begin_write() to be constant time
  2. I wanted to avoid doing extra book keeping to support rolling back these frees during a call to abort()

However, this is a pretty significant downside

@mconst
Copy link
Contributor Author

mconst commented Jul 26, 2024

Ah, yeah, doing the frees after the fsync() seems like a nice approach! You'd also have to redo them when opening the database, because they wouldn't get persisted, right? But that seems fine. And it's definitely nice not to have to worry about rolling back the frees in abort().

I think there might be a bit of complexity with savepoints, because the allocator state at the time the snapshot is taken will no longer match the freed tree that's saved. But that's not too hard to deal with, and in fact I believe the current code there isn't quite right anyway (I'll submit another issue for that in a few minutes).

@cberner
Copy link
Owner

cberner commented Jul 26, 2024

You'd also have to redo them when opening the database, because they wouldn't get persisted, right?

There's a Drop implementation on TransactionalMemory which persists outstanding non-durable commits. I think it can persist the frees, so that they didn't need to be replayed when the database is opened

in fact I believe the current code there isn't quite right anyway

That wouldn't surprise me at all! Savepoints were the last big feature I added, and they created a lot of bugs. Sometimes I regret adding them, haha, but they're a rather convenient feature

@mconst
Copy link
Contributor Author

mconst commented Jul 26, 2024

There's a Drop implementation on TransactionalMemory which persists outstanding non-durable commits. I think it can persist the frees, so that they didn't need to be replayed when the database is opened

Oh nice! Yeah, that's even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants