Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558

rnewson · 2025-06-04T15:13:49Z

Overview

Apache CouchDB retains some information (at minimum, doc id, doc revision tree and a deleted flag) for all deleted documents forever, in order that replication is guaranteed to converge. This is excessively pessimistic and we would like to improve matters.

This PR introduces a number of changes to achieve its goal;

database shards (.couch files under shards/ directory) gained an additional header property called drop_seq. Once set to a positive, non-negative integer, any deleted document with a lower update sequence is skipped entirely at next compaction.
The notion of a peer checkpoint document. These are all local docs and their ids must have prefix `_local/peer-checkpoint-'.
All indexers (mrview, search, nouveau) and the replicator have been taught to create and update peer checkpoints with the update sequence they have seen at appropriate times (i.e, after they have made every effort to commit the changes they've seen to durable storage).
A new endpoint POST /$dbname/_update_drop_seq which gathers information about the shards of the database, update sequences from all peer checkpoint documents, and the internal shard sync documents, and computes the drop_seq for each shard, and then sends RPC requests to those databases to update the drop_seq.

Testing recommendations

There are some simple tests in the eunit and elixir suites which will be run via the normal Makefile targets.

Additionally there is a stateful property-based test that exercises the code more comprehensively which can be started with make elixir-cluster. This will start a 3 node cluster with nouveau server running and perform random permutations of all relevant operations that could alter which deleted documents are dropped (making docs, deleting docs, creating indexes, creating and updating peer checkpoints, splitting shards).

Related Issues or Pull Requests

N/A

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
[TODO] Documentation changes were made in the src/docs folder
Documentation changes were backported (separated PR) to affected branches

Any deleted documents with an update seq equal or lower than drop seq is completely removed during compaction.

rnewson added 2 commits April 23, 2025 10:06

Introduce drop_seq

fb4f21f

Any deleted documents with an update seq equal or lower than drop seq is completely removed during compaction.

Implement _update_drop_seq

ade8dd1

rnewson force-pushed the auto-delete-3 branch from 8364455 to ade8dd1 Compare June 5, 2025 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558

Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558

Uh oh!

rnewson commented Jun 4, 2025

Uh oh!

Uh oh!

Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558

Are you sure you want to change the base?

Introduce peer checkpoints so CouchDB can safely remove deleted documents entirely #5558

Uh oh!

Conversation

rnewson commented Jun 4, 2025

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

Uh oh!

Uh oh!