-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writes don't reuse freed pages from the previous transaction #829
Comments
Ah yes, I need to think through whether it's possible to do this free'ing at the end of the commit, after the last fsync() I think I implemented it the way it is for a couple reasons:
However, this is a pretty significant downside |
Ah, yeah, doing the frees after the fsync() seems like a nice approach! You'd also have to redo them when opening the database, because they wouldn't get persisted, right? But that seems fine. And it's definitely nice not to have to worry about rolling back the frees in abort(). I think there might be a bit of complexity with savepoints, because the allocator state at the time the snapshot is taken will no longer match the freed tree that's saved. But that's not too hard to deal with, and in fact I believe the current code there isn't quite right anyway (I'll submit another issue for that in a few minutes). |
There's a
That wouldn't surprise me at all! Savepoints were the last big feature I added, and they created a lot of bugs. Sometimes I regret adding them, haha, but they're a rather convenient feature |
Oh nice! Yeah, that's even better. |
Thanks for writing redb!
A sequence of large write transactions seems to consume twice as much disk space as I was expecting. If each transaction modifies 1 GB of existing pages (and there are no live reads), I was expecting 1 GB of overhead: the old data needs to be kept around until the transaction has committed.
But in fact this has 2 GB of overhead. The first transaction allocates 1 GB of new pages, and adds 1 GB of old pages to the freed tree, as expected. The second transaction correctly frees those old pages -- but they're freed at the end of the transaction, rather than the beginning. So even though the pages are no longer referenced by anything and could be reused, the second transaction doesn't use them; instead, it allocates a fresh 1 GB of pages, doubling the disk usage. The original pages don't get reused until the third transaction[*].
It's possible to work around this, at the cost of some extra latency, by adding an extra empty write transaction after each real write. But it's kind of surprising that that's necessary, and it would be nice if it weren't needed!
This also interacts with another issue, where live reads are blocking page freeing for one transaction longer than they should. I'll submit that one separately.
[*] A few pages do actually get reused in the second transaction, to store the updated freed tree. But they don't get reused for data.
The text was updated successfully, but these errors were encountered: