The full name of CRDT
in English is Conflict-free Replicated Data Type
, which was initially developed for collaborative text editing and mobile computing, and is now also used in online chat systems, audio distribution platforms, and more. Currently, CRDT
algorithms are still typical in the field of collaborative rich text editing and are commonly used as underlying algorithms for implementing document collaboration, supporting multiple users to edit documents simultaneously without causing conflicts due to concurrent modifications, which can lead to inconsistent or even lost data.
Conflict-free Replicated Data Type
, as its name suggests, focuses on conflict-free replication and data types. Simply put, CRDT
is a type of data structure used to ensure eventual consistency. In distributed systems, data replication between different nodes presents a consistency issue, known as strong consistency problem. CRDT
serves as a theory to guide the design of the original data structure into a new data structure that leads to eventual consistency during the data replication process. When multiple replicas or operations are involved, if the hosting computers are not coordinated, uncoordinated merges may lead to inconsistency between replicas, which is often unsolvable. When conflicts exist during updates, achieving consistency and data integrity may require partial or even complete updates or deletions. Thus, CRDT
guides us in merging or updating data from multiple replicas or operations, ensuring that our data can be automatically merged, conflicting issues resolved, and eventually achieving strong consistency. Returning to collaborative rich text editing, it can be understood as a type of distributed application. Through its data structure design, CRDT
ensures the eventual consistency of concurrent data operations. In simple terms, CRDT
is a data structure that can be replicated on multiple terminals in a network, where replicas can be independently and concurrently updated without coordination, ensuring that no conflicts occur and that the content of each replica eventually becomes consistent.
Before discussing specific collaborative algorithms, let's explore why collaborative algorithms are necessary, the problems that may arise without them, and the specific scenarios in which problems may arise. For example, if we have an online document application and we are a team, it's possible for us to edit the same document. Since simultaneous editing is likely, conflicts may arise. For instance, if the current content of the document is A
, and two users, U1
and U2
, start editing simultaneously, resulting in the document states B
and C
respectively. If U1
and U2
save their edits successively, the final document state becomes C
, resulting in the changes made by U1
from state A
to state B
being lost. To tackle this problem, there are several common solutions.
Optimistic locking is a method that assumes the best-case scenario. It does not involve any locks during the operation process; strictly speaking, optimistic locking cannot be considered a lock. Optimistic locking always assumes the best situation, where each time data is retrieved, it is assumed that no one else will modify it. There may be a check at the time of an update to see if anyone has updated this data during this period, or no prompts may be given.
Applied to document editing, we can optimistically assume that two people will never edit the same document at the same time. In reality, this situation may occur, for example, if each person in the team is only responsible for a few documents and other people do not have permission to edit documents outside their responsibility. Based on this requirement, we can optimistically assume that conflicts will never occur, thus no restrictions on document management are needed, only the ability to provide complete editing.
In contrast, pessimistic locking is a preventive approach to preventing any data conflicts in a pessimistic manner. It locks the data before modifying it and then allows reading and writing of the data. No one else can operate on the data until the lock is released, and this mechanism guarantees the exclusivity and correctness of data.
Applied to document editing, we can lock the editing operation permissions of the same document, ensuring that only one person can edit the document at a time while others have to wait. Only when the previous person has finished editing and released the lock can the next person edit the document. It can also allow forced preemption and store the scene of the preempted operation, converting concurrent operations into linear operations and guaranteeing the correctness of the document content through exclusive means, avoiding conflicts and losses in the document content.
Automatic merging, where document content is automatically merged and conflicts are handled, is a feasible solution. Similar to the version management concept in Git
, it can perform operations such as diff
differential comparison and merge
merging of submitted content. It can also prompt users to handle unresolved conflicts. For example, GitBook
adopts this approach to resolve conflict issues.
Collaborative editing allows multiple users to edit documents simultaneously without causing conflicts, leading to inconsistent or lost data. The key to collaborative editing lies in collaborative algorithms, mainly including Operational Transformation (OT)
and Conflict-free Replicated Data Type (CRDT)
. Collaborative algorithms do not need to be correct; they only need to be consistent and strive to maintain the user's intent. This means that the main purpose of collaborative algorithms is to provide eventual consistency while preserving the user's intent as much as possible, rather than maintaining the user's intent. Currently, applications such as Mubu, Tencent Docs, Feishu Docs, and Google Docs
are based on the OT
collaborative algorithm, while the Atom
editor uses the CRDT
collaborative algorithm.
The core idea of the Conflict-free Replicated Data Type (CRDT)
collaborative algorithm is not to resolve conflicts, but to construct a data structure to avoid conflicts. By avoiding conflicts, the algorithm can directly merge and ultimately obtain the document content. The purpose of the CRDT
collaborative algorithm is to maintain the final consistency of the document while keeping the user's intention as much as possible. For example, when both A
and B
insert different characters at position L
in the document simultaneously, the collaborative algorithm does not care about the ordering of the inserted characters. It only needs to determine, as much as possible, whose character comes first based on a certain strategy such as timestamps. The final calculated result of which character comes first does not affect the collaborative algorithm, instead, it focuses on presenting the document content in front of each user consistently after scheduling the Op
generated by the users. This is the essence of maintaining the final consistency of the document. From a functional perspective, the collaborative algorithm ensures that in the scenario of multiple people editing online simultaneously, since each person submits different content, it is necessary to use collaborative algorithm scheduling to ensure that each user ultimately sees the same content. In practice, an online document itself has a strong requirement for data consistency, so whether using the CRDT
algorithm or the OT
algorithm to achieve collaboration, ensuring final consistency is a fundamental consideration.
Since CRDT
is designed to merge and update various replicas without generating conflicts, algorithms implemented by CRDT
can be directly transmitted and synchronized between clients to achieve final consistency. This means that users can directly perform peer-to-peer data merging without the need for a central server to schedule, making CRDT
well-suited for decentralized applications, even if there is no centralized server, each end can still complete synchronization.
When discussing final consistency and distributed systems, it is necessary to mention the CAP
theorem. The CAP
theorem states that in a distributed system, at most two of the following three guarantees can be simultaneously fulfilled: Consistency
, Availability
, and Partition Tolerance
.
- Consistency: For each client read operation, it either returns the most recent data or fails to read. In other words, consistency is a commitment to clients accessing the system from the perspective of a distributed system: either I will return an error to you, or I will return absolutely consistent and latest data. It emphasizes the correctness of data.
- Availability: Any client's request can receive a response without error. In other words, availability is a commitment to clients accessing the system from the perspective of a distributed system: I will definitely return data to you without errors, but I do not guarantee the latest data. It emphasizes the absence of response errors.
- Partition Tolerance: Due to communication through the network in distributed systems, which is unreliable. When any number of messages are lost or delayed, the system continues to provide services without crashing. In other words, partition tolerance is a commitment to clients accessing the system from the perspective of a distributed system: I will keep running regardless of any data synchronization issues within my system. It emphasizes not crashing.
For a distributed system, P
is a prerequisite that must be guaranteed because as long as there is network interaction, there will definitely be delays and data loss. This situation must be accepted, and the system must ensure it does not crash. Therefore, only C
and A
can be chosen - either ensure data consistency, or ensure availability. First, it is important to understand why P
is a prerequisite. In the context of a distributed system, the network is unreliable, and there will always be issues such as network delays and data loss. If we do not allow the existence of network partitions, which means the network always runs normally, then obviously data synchronization can be performed every time a write is made. Therefore, both consistency and availability are naturally guaranteed, but this would not be a distributed system.
Let's take an example. When there is a network problem, the need to choose between C
and A
becomes apparent. Consider a scenario where there are 100
nodes in a distributed system, but due to a fault, the network is partitioned, and half of the nodes cannot communicate with the other half, dividing the system into regions A
and B
. In this scenario, if a client tries to write data to a node in region A
, but the network partition prevents the data from synchronizing to region B
, a choice must be made.
- If allowing client data to be written, the availability of the current node is ensured, but due to the network partition, consistency cannot be guaranteed. In other words, when accessing two nodes simultaneously in the distributed system, different data may be returned.
- If not allowing client data to be written, the consistency of the current node is ensured, but because the data cannot be written, the system is clearly not available and needs to be blocked until the network connection is restored.
In reality, the description of the CAP
theorem's three choices of two guarantees is somewhat misleading. As seen in the example, it is not always necessary to choose only two guarantees at all times. When a network failure or partition occurs, the choice is made between C
and A
, but no one can guarantee that the network will always be unimpeded, so strategies must be devised for such situations in order to achieve a balance between C
and A
. In 2012, author Eric Brewer
also published a paper explaining that in practice, the CAP
theorem only prohibits the perfect availability and consistency of the design space in the presence of partitions. In reality, the balance between C
and A
is very flexible, and CRDT
is a good example of this. The CRDT
algorithm, under the premise of satisfying Partition Tolerance
, strives to ensure Consistency
and Availability
as much as possible, and the OT
collaborative algorithm also achieves this balance by ensuring final consistency.
In a collaborative editing scenario, it seems that we are able to simultaneously satisfy the three scenario limits of CAP
. Assuming that in the case of poor network, two clients submit simultaneously, although temporary consistency cannot be maintained and local clients will see different content, after the network recovers, the collaborative algorithm ensures data consistency. This not only satisfies consistency but also ensures availability. This idea is correct, but the consistency we guarantee here is not the same as the consistency in the CAP
theory. The CAP
theory assumes strong consistency in the absence of network delays, meaning the data is always consistent. In the collaborative editing scenario, the consistency is eventual consistency. As mentioned earlier, the trade-off design between C
and A
is very flexible. Since strong consistency Strong consistency
cannot be achieved, applications can adopt appropriate ways based on their own business characteristics to achieve eventual consistency Eventual consistency
. This brings us to the need to talk about the BASE
theory.
BA
:Basically Avaliable
, allows for partial availability loss, sacrificing response time, degrading system function, etc.S
:Soft State
, allows for the existence of intermediate data states, does not require strong consistency, and does not affect overall availability, allowing data synchronization delays between replicas.E
:Eventually consistency
, where all replicas can eventually reach a consistent state, but real-time strong consistency is not required.
In the BASE
theory, we can see that in the scenario of collaborative document editing, although strong consistency of CAP
cannot be met, by flexibly designing C
and A
, sacrificing partial availability, allowing temporary client inconsistencies, and resolving data inconsistencies through collaborative editing conflict algorithms, eventual data consistency is achieved. In fact, there is a lot of research in distributed theory, with strong consistency, weak consistency, sequential consistency, eventual consistency, causal consistency, and so on. Eventual consistency also has divisions such as read-write consistency, write-read consistency, monotonic read consistency, and monotonic write consistency, which will not be further elaborated here.
Next, let's briefly give an example of how to implement an eventually consistent distributed system. Let's assume we have an account T
with an initial balance of 100
dollars. Users can access account T
through several nodes in the system, such as A
, B
, and C
, and can make operations on T
at any time. At this point, we do not have a central server to store the 100
dollars, but each account independently stores this number. In this distributed system, we do not consider security aspects, just the need to ensure that everyone's data is eventually consistent.
Let's say at time t1
, A
deposited 10
dollars into T
, B
withdrew 10
dollars from T
, and C
then queried how much money T
had. These operations all happened simultaneously. At this moment:
- In the
A
system,T
seems to have110
dollars. - In the
B
system,T
seems to have90
dollars. - In the
C
system,T
seems to have100
dollars.
At time t1
, all these perspectives are correct, as we expect in an eventually consistent system that temporary inconsistencies can exist. After a period of time, let's say at t2
, we need to make sure that A
, B
, and C
all see T
as having 100
dollars, achieving eventual consistency. This will require some operations in between, such as exchanging necessary information and data between the systems. Here comes the challenge: how to perform data exchange? If each node only stores the balance of T
as an integer variable, it will not work because when the data from A
and B
is transferred to C
, C
won't be able to discern which result from A
or B is correct. Thus, the key is how to handle this problem. The simplest solution is to add additional data to ensure the correctness of merging, which is the focus of
CRDT`.
In this example, we can design each system to store not a final value, but a series of records containing the time and balance. Let's say our system starts at time t0
, in our example, the data stored at time t1
will be as follows:
- In the
A
system, it stores(t0, 100)
,(t1, 110)
. - In the
B
system, it stores(t0, 100)
,(t1, 90)
. - In the
C
system, it stores(t0, 100)
,(t1, 100)
.
At time t2
, we perform data transmission. C
received the operation from A
to T
from t0
to t1
with +10
, and received the operation from B
to T
from t0
to t1
with -10
, so at t2
, C
still sees 100
. For A
, it received the operation from B
to T
from t0
to t1
with -10
, and as C
did not have an update operation, it does not need to synchronize. For B
, it received the operation from A
to T
from t0
to t1
with +10
, and as C
did not have an update operation, it does not need to synchronize. Ultimately, A
, B
, and C
all have a consistent total sum of 100
, achieving eventual consistency. Of course, this example is simple and the solution is rough. In practical applications, we must consider more data types and application scenarios. Therefore, designing a data structure that can maintain eventual consistency becomes a challenging task. This is why focusing on the theory of CRDT
is essential in designing distributed systems, reducing the cognitive burden of distributed collaborative design.
Returning to collaboration, before understanding the CRDT
collaborative algorithm, we can also understand the main differences between the CRDT
collaborative algorithm and the OT
collaborative algorithm. Both CRDT
and OT
provide eventual consistency, which is the ultimate goal of collaborative editing, but they achieve this goal in different ways:
-
CRDT
(Conflict-free Replicated Data Type) achieves this through data structures. It has two implementation approaches: Convergent Replicated Data Types (CvRDT) based on state, and Commutative Replicated Data Types (CmRDT) based on operations. CvRDT merges the various copies, and the number of merges and the order in which they occur are not important; all copies will converge. CmRDT, on the other hand, has commutative operations, so these operations can be correctly applied without the need for transformation. -
CRDT
is more suitable for distributed systems and may not require a central server. -
CRDT
ensures conflict-free editing through data structures, but it increases space complexity. -
Operational Transformation (OT) achieves this through the transformation of operations (O) to transformations (T). The operations conducted on the terminal (O) are transmitted over the network, and other terminals receiving operation (O) need to undergo transformation (T) before they can be applied to the document. The most basic OT involves transforming index positions to ensure convergence.
-
OT typically requires a central server for coordination.
-
OT addresses the issue of editing conflicts through algorithms, but it increases time complexity.
As we discussed the CAP theorem earlier, we need to make trade-offs between consistency and availability. Temporarily sacrificing some availability to achieve eventual consistency is completely acceptable. However, designing such a distributed system in practical scenarios is still very complex. CRDT is a theoretical approach towards achieving eventual consistency which can guide us in designing a highly available distributed system. CRDT is not limited to the collaborative domain; various distributed systems and applications are now beginning to explore CRDT. Redis Labs and Riak have already implemented multiple data structures, and Microsoft's CosmosDB uses CRDT as a multi-region consistency solution in Azure.
So, what does CRDT actually represent? How does it merge without conflicts? In distributed systems, there are two ways to implement replication. One is to replicate the full state between the primary node and the secondary node, and the other is to replicate the operations themselves. Put simply, full state replication is akin to copying the entire document being edited and synchronizing it with other clients, while replicating operations is similar to only copying the "Op" caused by the current edit to other clients. When it comes to replicating the state, which is the full state replication, there needs to be some convergence rules. Thus, we can create Convergent Replicated Data Types, or CvRDT, which is a state-based convergent replicated data type, also known as a State-based CRDT. If replicating operations, the operations need to be designed as Commutative, and operation transformation is not required. As a result, we can create Commutative Replicated Data Types, or CmRDT, which is an operation-based commutative replicated data type, also known as Op-based CRDT. In both cases, the goal is to ensure that operations do not conflict with each other and can occur independently to avoid coordination, so they can be collectively referred to as Conflict-free Replicated Data Types, or CRDT.
State-based CRDT may sound intimidating at first, but it's not as difficult to understand when broken down. In state-based CRDT, nodes transmit states, and there is a function Merge: (SA, SB) => SC
. Upon receiving the transmitted state, it merges with its stored state. This function must satisfy commutative, associative, and idempotent laws. So, from the perspective of abstract algebra, it makes the entire state system form a lattice. As long as new states from all other nodes can be received, the system will always converge, thus avoiding conflicts. This also eliminates the need for underlying communication mechanisms in CmRDT. However, because the transmitted data is the state itself, it may be a bit larger eventually, although there are optimization strategies available.
The previous paragraph may be a bit challenging to understand. Let's break it down slowly. First, let's look at the function Merge: (SA, SB) => SC
. This means that the content of the synchronized state needs to be merged, which is the main purpose of state-based CRDT, i.e., merging with the received state with its stored state. Next, we need to consider the conditions that this function must satisfy, namely the commutative, associative, and idempotent laws:
- Commutative:
A ∪ B = B ∪ A
. - Associative:
(A ∪ B) ∪ C = A U (B U C)
. - Idempotent:
A U A = A
.
When we look at these three formulas, we can easily understand why this operator needs to satisfy these three laws. The first point we need to focus on is that the network is unreliable, and distributed systems must rely on the network for data transmission. So, let's see why it is necessary to ensure these three formulas:
-
Commutative law. Suppose we have two users,
A
andB
, transmitting data to each other. WhenA
synchronizes data toB
, it isA ∪ B
, and whenB
synchronizes data toA
, it isB ∪ A
. Therefore, to ensure eventual consistency, we must ensure the commutative law to correctly merge the data, making the order of application irrelevant. -
Associative law. If we have five users
A
,B
,C
,D
, andE
transmitting data to each other, andA
,B
, andC
make changes to the data,D
andE
need to synchronize data from the other three clients. Due to network delays, the order in which clients arrive atD
andE
may differ, so we need to satisfy the associative law to ensure the correct data merge, making the grouping irrelevant. -
Idempotent law. If two users,
A
andB
, are transmitting data to each other, and afterA
makes changes, the replica is transmitted over the network with noACK
confirmation for some time, there may be a timeout mechanism that leads to retransmission ofA
. If the originalA
packet was just delayed and the newA
packet arrives due to a different route, both identicalA
packets may reachB
at the same time. In this scenario, to ensure correct data merging, we need to guarantee idempotence, making duplicates irrelevant.
It can be seen that satisfying these three formulas is due to the unreliability of the network, and we need to design data structures to ensure eventual consistency. Additionally, it is important to note that if we can ensure the semantics of "Exactly once," then the idempotent law condition can be relaxed. For example, addition satisfies the commutative and associative laws but is not idempotent. Yet, if we can guarantee that the addition operation is only replicated once, the result is still eventually consistent.
In practice, by ensuring the above three theorems, we can synchronize data in distributed systems without considering the order of merging and the problem of multiple mergers, while ensuring eventual consistency.
Now, let's take a look at the concept of a semilattice. Before understanding the semilattice, we need to discuss the concept of a partially ordered set. After understanding the semilattice, we can roughly understand how CRDT
can merge without conflicts. Suppose P
is a set, and the binary relation ≤
on P
satisfies the following three conditions; then, ≤
is called a partial order relation on P
:
- Reflexivity:
∀a∈P
,a≤a
. - Antisymmetry:
∀a, b∈P
, ifa≤b
andb≤a
, thena=b
. - Transitivity:
∀a, b, c∈P
, ifa≤b
andb≤c
, thena≤c
.
It is important to note that here ≤
does not necessarily refer to the standard mathematical meaning of less than or equal to. We usually consider it as "x precedes y," which means that this partial order is not necessarily about comparing sizes and does not require the elements of the set to be numbers. The key is how to define this binary relation, as long as it satisfies the above three properties, it is a partial order relation. It may sound a bit abstract, so let's take an example. The set of natural numbers has its natural order, that is, the less than or equal to relation, which makes it a total order. We can also define a relation ≤
as a subset, meaning we can compare sets through subsets, thus constructing a partial order relation. In the following example, we can see the partial order relation using subsets:
[The diagram of the subsets]
Next, let's look at the concept related to the lattice. A lattice is a partially ordered set with different tops (least upper bound) and bottoms (greatest lower bound). A semilattice is also a partially ordered set, and every non-empty set has a supremum or infimum, where the supremum is called "Least upper bound" or LUB
. This means that compared to a lattice, a semilattice has only one different top or bottom, hence the name "semilattice." A join semilattice is a semilattice with different tops (least upper bound), while a meet semilattice is a semilattice with different bottoms (greatest lower bound). Any data structure represented as a semilattice can be implemented to ensure convergent replication. For example, computing the Max()
of a set of values will always return the same result, regardless of the order in which the values are received, as long as all values are received in the end, because the Max()
operation satisfies the commutative, associative, and idempotent laws. Referring back to the example of the set, if we change the definition, there are two types of cells: one defining the Union(items)
lattice for the set and the other defining the Max(value)
lattice for the data. For two semilattices, we ensure a maximum upper bound. In the example, the eventually consistent data for the three sets is {x, y, z} 3
.
[The diagram of the semilattice]
After going through semilattices and partially ordered sets, it seems like we've circled back to commutativity, associativity, and idempotence. In fact, as mentioned in the semilattice section, as long as we can define a meaningful least upper bound (LUB
) function, we can create a CRDT
. If the Max(x, y)
mentioned above doesn't seem intuitive, we'll provide more examples of data structures in this section to explain the issue. Perhaps in the data structure of rich text, there isn't such a straightforward way to satisfy the semilattice condition with Max(x, y)
, but don't forget that we can attach additional information to the data. As mentioned earlier, CRDT
ensures conflict-free editing through data structure, which increases space complexity. An important point in data structure design is to carry additional information, such as timestamps, logical timestamps, unique user IDs, etc. By attaching metadata, we can ensure that updates remain monotonic, thereby forming a semilattice with a least upper bound.
The operation-based CRDT
is actually similar to the state-based CRDT
mentioned above, but instead of working on states, it focuses on replicating and synchronizing operations (Op
). Hence, we also need to design operations to adhere to commutativity, associativity, and idempotence, as the objects being transmitted from states to operations need to pass through the network, inevitably encountering the issues that distributed systems must solve. Therefore, the design of transmitted operations is crucial, and using op-based CRDT
can significantly reduce the data that needs to be transmitted during synchronization.
If the update operations of the local client itself satisfy the three laws mentioned above, then synchronizing replicated operations to other clients only requires replaying the operations that have been transmitted. The simplest example is taking the union of sets. If the local client's operations do not satisfy the mentioned conditions, then consider attaching metadata to the operations (Op
) in order to meet the three laws. Similarly, if we can ensure the Exactly once
semantic, the idempotence condition can be relaxed. If it still cannot be satisfied, we can consider synchronizing replica data using State-based CRDT
, while attaching additional metadata to ensure that local updates and the merging of operations from other clients satisfy the three laws. Interestingly, using an operation-based approach always allows us to simulate a state-based approach.
As the objects of operations are Op
, we need to restate how we can satisfy commutativity, associativity, and idempotence based on snapshots. Similar to OT
, we need to define some concepts. snapshot
is the content of the current document, and apply(snapshot, op)
is the application of Op
based on snapshot
to update the document. The representation of the three laws is as follows:
- Commutativity:
apply(apply(snapshot, A), B) = apply(apply(snapshot, B), A)
. - Associativity:
apply(apply(apply(snapshot, A), B), C)= apply(apply(apply(snapshot, B), C), A)
. - Idempotence:
apply(apply(snapshot, A), A) = apply(snapshot, A)
.
In the design of Op-based CRDT
, we also need to consider causal consistency. Prior to applying Op
, we need to check the snapshot to ensure that the causal ordering required by Op
is satisfied. For example, the operation of deleting node x
depends on the creation operation of node x
, otherwise, the operation cannot be applied, and in the case of unsatisfied conditions, it needs to be blocked and delayed. This also means that we are responsible for ensuring that the preceding states of each operation can be satisfied. In other words, we need to guarantee that the preceding conditions of each operation can be satisfied by applying them in causal order, so that all update functions will terminate. Ultimately, the replicas of each client pass each other the missing Op
to achieve eventual consistency through the apply
operation.
Similarly, even when synchronizing Op
, we can ensure the implementation of the three laws by guaranteeing a partial order. We can determine the partial order based on the execution sequence of each Op
on the replicas, meaning the additional metadata attached is timestamp or logical timestamp.
- If an
OpA
is executed beforeOpB
on a replica, thenA < B
. - If neither
A < B
norB < A
, then we can say thatA
andB
are parallel operations.
Therefore, in the design of Op-based CRDT
, we need to ensure that Op
satisfies commutativity, associativity, and idempotence, and ensure that the preceding conditions of each operation can be satisfied by applying them in causal order so that all update functions will terminate, thus obtaining a data structure that satisfies Op-based CRDT
. In fact, formally, Op-based CRDT
and State-based CRDT
can be converted into each other, so most of the contents that need to be understood in the Op-based CRDT
section are also reflected in the State-based CRDT
section. In the latter section, we mentioned attaching metadata to rich text to meet the conditions of CRDT
. Similarly, we can also attach metadata to Op
to meet the conditions. In simpler terms, if we assign a unique id
to each word in rich text, combined with logical timestamps, we can precisely know how to apply Op
to the document's location without the need to transform indices. Of course, this implementation is very rudimentary. For a deeper understanding, one can look into the YATA
algorithm and Yjs
. In Yjs
, most operations are designed and implemented based on Op-based CRDT
, while the handling of delete operations is similar to the implementation of State-based CRDT
.
Below are some typical examples of CRDT
data structures. These examples can help us understand the design principles of CRDT
. It's important to note that the examples here do not involve central server scheduling or data storage, as the data is synchronized between various clients.
The Op-based Counter
represents an integer counter. Let's say we have A
and B
as two clients maintaining a counter together. Both clients store an integer, and if the current value is 10
, and then A
performs an increment
operation, while B
performs a decrement
operation, we may temporarily have the value of A
as 11
and B
as 9
. Later, data synchronization is necessary to ensure that A
and B
achieve the ultimately consistent value of 10
. We can achieve this by synchronizing the increment
operation of A
and the decrement
operation of B
in the network, which ensures that both A
and B
will have the value of 10
.
It seems like this operation is easy to implement because addition naturally satisfies the commutative and associative properties, and subtraction can also be considered as addition, but with a negative value. However, it's important to note that in this example, addition is not idempotent, so it is crucial to ensure no loss or duplication during the synchronization process.
The State-based Counter
is not so obvious in its organizational form, let's start by looking at the Grow-only Counter
, which means a counter that only increments. The key concept here is that we are implementing a CRDT
that synchronizes data globally. Since the synchronization is done globally, if each replica accumulates independently, it becomes impossible to determine the exact cumulative value for each replica during the merge process. It is not sufficient to simply take the max
as the final result during synchronization. For example, if we have A
and B
as two clients maintaining a counter, both initially starting with 0
, and then A
increases by 1
and B
increases by 2
, taking max(1, 2)
won't work because what we actually want is the data to be 3
. Even though the data structure designed through max
satisfies the commutative, associative, and idempotent properties, it contradicts our objective of getting a counter, rather than obtaining the maximum value of the clients.
A feasible solution is to use an array on each replica to store the values of the other replicas. Local updates should only operate on the corresponding item in the array, and the merge operation should only modify the other items in the array, compute the max
of each item for merging. When querying, the local values from each replica should be summed to obtain the Counter
we desire. For example, if A
and B
are two clients maintaining a counter, both initially starting with 0
, then A
's data becomes [0, 0]
and B
's becomes [0, 0]
. If A
increases by 1
and B
increases by 2
, A
becomes [1, 0]
and B
becomes [0, 2]
. After data synchronization, A
becomes [1, 2]
and B
becomes [1, 2]
, thus making the data consistent between both clients, with the final count being 3
.
In the above approach, obtaining the max
of each item in the array during merging is how we maintain the commutative, associative, and idempotent properties. Due to the unreliable nature of networks, we cannot guarantee that data packets will always be synchronized correctly. If, for instance, a historical [1, 1]
package is still in transit in the network and arrives at a client when the client's value has already reached [6, 10]
, then this package must be discarded. Otherwise, we won't uphold the three properties we need to ensure. Thus, we maintain a partial order relationship, although natural numbers are actually a total order relationship. Of course, we can also attach a timestamp or logical timestamp to the data, as it also accomplishes the same objective by adding additional information.
In the final segment of the Grow-only Counter
, we presented an example where a historical [1, 1]
package was still in transit in the network and arrived at a client when the client's value had already reached [6, 10]
. This example shows that the system maintains a structure that is solely for incrementing data. If we add a negative number, or simply call it a subtraction operation, then the system described above becomes ineffective.
Therefore, a State-based CRDT
with a Decrement
is not as straightforward as the G-Counter
. After introducing subtraction, it no longer satisfies the monotonically increasing partial order relationship during updates. Hence, a feasible approach here is to construct two G-Counters
, one for storing the accumulated values of Increment
, and the other for storing the accumulated values of Decrement
. This achieves our goal of maintaining the three properties through two sets of CRDT
with a partial order relationship.
The Grow-only Set
represents a set that only allows addition. The Add
operation of the Set
is essentially a union and naturally satisfies the commutative, associative, and idempotent properties. Therefore, we can easily use an Op-based CRDT
, similar to the Op-based Counter
. In this case, there is no need to ensure no loss or duplication as when dealing with the Op-based Counter
, because taking the union of sets is idempotent. Of course, we can also use a State-based CRDT
for a full-data union, in which case each client's replica only needs to store a single copy of the set.
In the Two-Phase Set
, when we consider deletion operations, the implementation approach is similar to the PN-Counter
. We use two G-Sets
, where Set A
is responsible for addition, and an element removed from Set A
is not actually deleted but copied to Set R
. During queries, if an element is present in Set A
and absent in Set R
, then it indicates that the element is still there. In practice, the 2P-Set
is not very practical, as once an element is removed, it cannot be added back, and the complexity of maintaining the deleted elements in the original Set
is also significant and wasteful of space.
In the Last-Write-Wins-Element Set
, to solve the problem of not being able to re-add deleted elements, you can consider adding an update timestamp to each element in sets A
and R
of the 2P-Set
, while keeping other operations unchanged. Therefore, when querying, it is necessary to verify the existence of the element. If an element is in the adding set A
and not in the removing set R
, or if it is in the removing set R
but with a timestamp earlier than the latest timestamp in the adding set A
, then the element is considered to exist. Another more optimized implementation is to eliminate the need for the R
set, and instead, have each element in the A
set maintain a deletion flag in addition to an update timestamp. This demonstrates that there are various implementations through attaching metadata as long as they satisfy the commutative, associative, and idempotent laws.
The Observed-Remove Set
is similar to the implementation of the LWW-Element-Set
, but it uses a unique label tag/uuid
instead of a timestamp. We still need to maintain an add set Set A
and a remove set Set R
. When adding an element, a new unique tag tag/uuid
is generated, and when deleting, the element and tag are copied to the remove set Set R
. During a query, if an element is in Set A
and not in Set R
, then the element exists in the collection. Since we generate globally unique tag
, there is no need to compare timestamps, ensuring that deleted elements can be added again. This can be viewed as an implementation of the State-based CRDT
.
Another design that satisfies the Op-based CRDT
involves adding a unique tag/uuid
to each element every time Add(e)
is performed, while Remove(e)
will delete all occurrences of e
and their corresponding tag/uuid
on the current node. Therefore, even if there are concurrent Add(e)
operations while Remove(e)
is also taking place, e
can eventually be successfully added. This semantic is known as Add wins
. Next, let's see how this approach satisfies the commutative, associative, and idempotent laws. The descriptions here are all based on ensuring casual consistency, first, regarding commutative law, since tag/uuid
is globally unique, it is obvious that both Add
and Remove
operations can be exchanged, provided that casual consistency is met, otherwise, blocking and waiting are necessary. Similarly, the associative law is satisfied, allowing for arbitrary combination operations on sets under casual consistency, mainly because tag/uuid
is globally unique and inconsistency issues will not arise. Lastly, the idempotent law needs to be ensured. If the tag/uuid
of a resent package due to an unreliable network matches the data before the retry, and since the element already exists in reality, it will not be added repeatedly, making this operation idempotent. In fact, similar to the initial mention of the Op-based Counter
, Op
is commutable and ensures idempotence, making this a reasonable CRDT
design.
As CRDT
is a general solution for handling distributed system data synchronization issues, this article does not limit itself to the design of rich text data structures, but rather seeks to understand CRDT
from the perspective of distributed data synchronization, with intermittent applications of CRDT
in the rich text domain, thereby allowing for a better understanding of this data model. Similarly, the content introduced in this article is only the tip of the iceberg. Data synchronization in distributed systems has always been a complex problem. Returning to the rich text domain, issues such as ensuring collaborative editor performance, making trade-off strategies under the CAP
theory, ensuring data stability, recoverability, and traceability, handling cursor synchronization, and managing Undo/Redo
, all require in-depth research and design.
In the rich text domain, the current mainstream solution for online documents is still OT
. In an article "CRDTs are the future" by the author of ShareDB
in September 2020, the trade-offs between CRDT
and OT
were discussed. The biggest issue with OT
is that it must rely on a central server. Therefore, seamless data sharing between all devices depends on the server's data scheduling. On the other hand, using CRDT
allows for data synchronization without the need for a central server, making it a powerful means of solving consistency issues in the future. Similar to OT
, there are many open-source implementations in the field of CRDT
. By applying the foundational CRDT
library, applications can naturally acquire support for concurrent updates in multi-user collaboration scenarios with only a simple data Model
layer and an API
almost identical to Backbone
. Here, two recommended CRDT
implementations are automerge
(https://github.com/automerge/automerge/) and yjs
(https://github.com/yjs/yjs/).
https://github.com/WindrunnerMax/EveryDay
https://zhuanlan.zhihu.com/p/50990721
https://zhuanlan.zhihu.com/p/424467723
https://zhuanlan.zhihu.com/p/452980520
https://zhuanlan.zhihu.com/p/510797688
https://zhuanlan.zhihu.com/p/425265438
http://www.alloyteam.com/2020/01/14221/
https://www.jdon.com/artichect/crdt.html
https://www.zhihu.com/question/507425610
https://juejin.cn/post/6844903672032264199
https://juejin.cn/post/7049939780477386759
https://josephg.com/blog/crdts-are-the-future/
https://www.zxch3n.com/crdt-intro/design-crdt/
https://my.oschina.net/fileoptions/blog/1819610
https://hal.inria.fr/file/index/docid/555588/filename/techreport.pdf