-
-
Notifications
You must be signed in to change notification settings - Fork 170
Description
What would you like to be added:
Background
In Yorkie's CRDT system, most operations can leverage commutativity for convergence without requiring explicit reordering. This includes: Object.Set
, Object.Remove
, Array.Insert
, Array.Remove
, Text.Edit
, Tree.Edit
However, certain position-sensitive operations are inherently non-commutative, and their final state depends on the order of application. These include: Tree.Split
, Tree.Merge
, Array.Move
, Object.Move
These operations often require causal reordering or full replay to ensure convergence. Moreover, schema constraints(e.g., "this array can have only two items") can prevent concurrent operations from being applied directly, requiring validation and reconciliation.
To address these challenges, I propose a Reorder/Replay engine that guarantees convergence for all operation types through VersionVector-aware reordering.
🎯 Goals
- Introduce a Reorder/Replay engine that can reorder non-commutative operations based on causal history
- Ensure 100% convergence across all CRDT operation types
- Maintain acceptable performance and memory overhead
- Integrate with existing PushPull sync model
🧩 Operation History Structure
We propose maintaining three components for operation replay and compaction:
MinRoot
: The document state at the minVersionVector received from the server.Linear Operation Array
: A causally ordered array of operations(sorted by Logical Clock). This forms the basis for replay.Max Root
: Used for client-side caching and rendering, updated periodically but not required for correctness.
AS-IS:
type InternalDocument struct {
// ...
root *crdt.Root // document state at the max versionvector(latest in this machine)
localChanges []*change.Change
}
TO-BE:
type InternalDocument struct {
// ...
minRoot *crdt.Root // document state at the minVersionVector
maxRoot *crdt.Root // document state at the maxVersionVector(latest in this machine)
changes []*change.Change // min versionvector ~ max versionvector(linear log)
localChanges []*change.Change // partial set of `changes`
}
Benefits
- Simplifies replay logic with a linear log
- Easy integration with PushPull API
- Avoids the complexity of DAGs or tree-based histories
- Supports schema validation and rollback during replay
📐 Design & Implementation Plan
A. Algorithm Design & Validation
- Define operation classification (commutative vs non-commutative)
- Design replay strategy based on VersionVector ordering
- Explore hybrid approaches (commutativity + replay fallback)
- Define schema validation & conflict resolution policy
- Build prototype and benchmark replay performance
B. Engine Implementation
- Implement VersionVector-based OperationHistory
- Integrate with PushPull response (start snapshot, minVersionVector, operations)
- Build snapshot-based replay logic
- Optimize in-memory operation storage
- Handle schema validation during replay
C. Optimization & Integration
- Analyze time/space complexity and optimize
- Integrate with devtools for debugging reordering
- Support undo via replay-aware operation tracking
- Plan gradual migration from current model
📈 Success Metrics
- 100% convergence in complex concurrent editing scenarios
- < 20% performance degradation vs current approach
- < 30% memory overhead
- Consistent behavior across all SDKs
Why is this needed:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status