Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Apache CouchDB retains some information (at minimum, doc id, doc revision tree and a deleted flag) for all deleted documents forever, in order that replication is guaranteed to converge. This is excessively pessimistic and we would like to improve matters.
This PR introduces a number of changes to achieve its goal;
database shards (
.couch
files undershards/
directory) gained an additional header property calleddrop_seq
. Once set to a positive, non-negative integer, any deleted document with a lower update sequence is skipped entirely at next compaction.The notion of a
peer checkpoint document
. These are all local docs and their ids must have prefix `_local/peer-checkpoint-'.All indexers (mrview, search, nouveau) and the replicator have been taught to create and update peer checkpoints with the update sequence they have seen at appropriate times (i.e, after they have made every effort to commit the changes they've seen to durable storage).
A new endpoint
POST /$dbname/_update_drop_seq
which gathers information about the shards of the database, update sequences from all peer checkpoint documents, and the internal shard sync documents, and computes thedrop_seq
for each shard, and then sends RPC requests to those databases to update thedrop_seq
.Testing recommendations
There are some simple tests in the eunit and elixir suites which will be run via the normal Makefile targets.
Additionally there is a stateful property-based test that exercises the code more comprehensively which can be started with
make elixir-cluster
. This will start a 3 node cluster with nouveau server running and perform random permutations of all relevant operations that could alter which deleted documents are dropped (making docs, deleting docs, creating indexes, creating and updating peer checkpoints, splitting shards).Related Issues or Pull Requests
N/A
Checklist
rel/overlay/etc/default.ini
src/docs
folder