Skip to content

Latest commit

 

History

History
210 lines (131 loc) · 45.3 KB

初探富文本之CRDT协同算法.md

File metadata and controls

210 lines (131 loc) · 45.3 KB

Exploring CRDT Collaborative Algorithm for Rich Text

The full name of CRDT in English is Conflict-free Replicated Data Type, which was initially developed for collaborative text editing and mobile computing, and is now also used in online chat systems, audio distribution platforms, and more. Currently, CRDT algorithms are still typical in the field of collaborative rich text editing and are commonly used as underlying algorithms for implementing document collaboration, supporting multiple users to edit documents simultaneously without causing conflicts due to concurrent modifications, which can lead to inconsistent or even lost data.

Description

Conflict-free Replicated Data Type, as its name suggests, focuses on conflict-free replication and data types. Simply put, CRDT is a type of data structure used to ensure eventual consistency. In distributed systems, data replication between different nodes presents a consistency issue, known as strong consistency problem. CRDT serves as a theory to guide the design of the original data structure into a new data structure that leads to eventual consistency during the data replication process. When multiple replicas or operations are involved, if the hosting computers are not coordinated, uncoordinated merges may lead to inconsistency between replicas, which is often unsolvable. When conflicts exist during updates, achieving consistency and data integrity may require partial or even complete updates or deletions. Thus, CRDT guides us in merging or updating data from multiple replicas or operations, ensuring that our data can be automatically merged, conflicting issues resolved, and eventually achieving strong consistency. Returning to collaborative rich text editing, it can be understood as a type of distributed application. Through its data structure design, CRDT ensures the eventual consistency of concurrent data operations. In simple terms, CRDT is a data structure that can be replicated on multiple terminals in a network, where replicas can be independently and concurrently updated without coordination, ensuring that no conflicts occur and that the content of each replica eventually becomes consistent.

Before discussing specific collaborative algorithms, let's explore why collaborative algorithms are necessary, the problems that may arise without them, and the specific scenarios in which problems may arise. For example, if we have an online document application and we are a team, it's possible for us to edit the same document. Since simultaneous editing is likely, conflicts may arise. For instance, if the current content of the document is A, and two users, U1 and U2, start editing simultaneously, resulting in the document states B and C respectively. If U1 and U2 save their edits successively, the final document state becomes C, resulting in the changes made by U1 from state A to state B being lost. To tackle this problem, there are several common solutions.

Optimistic Locking

Optimistic locking is a method that assumes the best-case scenario. It does not involve any locks during the operation process; strictly speaking, optimistic locking cannot be considered a lock. Optimistic locking always assumes the best situation, where each time data is retrieved, it is assumed that no one else will modify it. There may be a check at the time of an update to see if anyone has updated this data during this period, or no prompts may be given.

Applied to document editing, we can optimistically assume that two people will never edit the same document at the same time. In reality, this situation may occur, for example, if each person in the team is only responsible for a few documents and other people do not have permission to edit documents outside their responsibility. Based on this requirement, we can optimistically assume that conflicts will never occur, thus no restrictions on document management are needed, only the ability to provide complete editing.

Pessimistic Locking

In contrast, pessimistic locking is a preventive approach to preventing any data conflicts in a pessimistic manner. It locks the data before modifying it and then allows reading and writing of the data. No one else can operate on the data until the lock is released, and this mechanism guarantees the exclusivity and correctness of data.

Applied to document editing, we can lock the editing operation permissions of the same document, ensuring that only one person can edit the document at a time while others have to wait. Only when the previous person has finished editing and released the lock can the next person edit the document. It can also allow forced preemption and store the scene of the preempted operation, converting concurrent operations into linear operations and guaranteeing the correctness of the document content through exclusive means, avoiding conflicts and losses in the document content.

Automatic Merging

Automatic merging, where document content is automatically merged and conflicts are handled, is a feasible solution. Similar to the version management concept in Git, it can perform operations such as diff differential comparison and merge merging of submitted content. It can also prompt users to handle unresolved conflicts. For example, GitBook adopts this approach to resolve conflict issues.

Collaborative Editing

Collaborative editing allows multiple users to edit documents simultaneously without causing conflicts, leading to inconsistent or lost data. The key to collaborative editing lies in collaborative algorithms, mainly including Operational Transformation (OT) and Conflict-free Replicated Data Type (CRDT). Collaborative algorithms do not need to be correct; they only need to be consistent and strive to maintain the user's intent. This means that the main purpose of collaborative algorithms is to provide eventual consistency while preserving the user's intent as much as possible, rather than maintaining the user's intent. Currently, applications such as Mubu, Tencent Docs, Feishu Docs, and Google Docs are based on the OT collaborative algorithm, while the Atom editor uses the CRDT collaborative algorithm.

CRDT Collaborative Algorithm

The core idea of the Conflict-free Replicated Data Type (CRDT) collaborative algorithm is not to resolve conflicts, but to construct a data structure to avoid conflicts. By avoiding conflicts, the algorithm can directly merge and ultimately obtain the document content. The purpose of the CRDT collaborative algorithm is to maintain the final consistency of the document while keeping the user's intention as much as possible. For example, when both A and B insert different characters at position L in the document simultaneously, the collaborative algorithm does not care about the ordering of the inserted characters. It only needs to determine, as much as possible, whose character comes first based on a certain strategy such as timestamps. The final calculated result of which character comes first does not affect the collaborative algorithm, instead, it focuses on presenting the document content in front of each user consistently after scheduling the Op generated by the users. This is the essence of maintaining the final consistency of the document. From a functional perspective, the collaborative algorithm ensures that in the scenario of multiple people editing online simultaneously, since each person submits different content, it is necessary to use collaborative algorithm scheduling to ensure that each user ultimately sees the same content. In practice, an online document itself has a strong requirement for data consistency, so whether using the CRDT algorithm or the OT algorithm to achieve collaboration, ensuring final consistency is a fundamental consideration.

Since CRDT is designed to merge and update various replicas without generating conflicts, algorithms implemented by CRDT can be directly transmitted and synchronized between clients to achieve final consistency. This means that users can directly perform peer-to-peer data merging without the need for a central server to schedule, making CRDT well-suited for decentralized applications, even if there is no centralized server, each end can still complete synchronization.

When discussing final consistency and distributed systems, it is necessary to mention the CAP theorem. The CAP theorem states that in a distributed system, at most two of the following three guarantees can be simultaneously fulfilled: Consistency, Availability, and Partition Tolerance.

  • Consistency: For each client read operation, it either returns the most recent data or fails to read. In other words, consistency is a commitment to clients accessing the system from the perspective of a distributed system: either I will return an error to you, or I will return absolutely consistent and latest data. It emphasizes the correctness of data.
  • Availability: Any client's request can receive a response without error. In other words, availability is a commitment to clients accessing the system from the perspective of a distributed system: I will definitely return data to you without errors, but I do not guarantee the latest data. It emphasizes the absence of response errors.
  • Partition Tolerance: Due to communication through the network in distributed systems, which is unreliable. When any number of messages are lost or delayed, the system continues to provide services without crashing. In other words, partition tolerance is a commitment to clients accessing the system from the perspective of a distributed system: I will keep running regardless of any data synchronization issues within my system. It emphasizes not crashing.

For a distributed system, P is a prerequisite that must be guaranteed because as long as there is network interaction, there will definitely be delays and data loss. This situation must be accepted, and the system must ensure it does not crash. Therefore, only C and A can be chosen - either ensure data consistency, or ensure availability. First, it is important to understand why P is a prerequisite. In the context of a distributed system, the network is unreliable, and there will always be issues such as network delays and data loss. If we do not allow the existence of network partitions, which means the network always runs normally, then obviously data synchronization can be performed every time a write is made. Therefore, both consistency and availability are naturally guaranteed, but this would not be a distributed system.

Let's take an example. When there is a network problem, the need to choose between C and A becomes apparent. Consider a scenario where there are 100 nodes in a distributed system, but due to a fault, the network is partitioned, and half of the nodes cannot communicate with the other half, dividing the system into regions A and B. In this scenario, if a client tries to write data to a node in region A, but the network partition prevents the data from synchronizing to region B, a choice must be made.

  • If allowing client data to be written, the availability of the current node is ensured, but due to the network partition, consistency cannot be guaranteed. In other words, when accessing two nodes simultaneously in the distributed system, different data may be returned.
  • If not allowing client data to be written, the consistency of the current node is ensured, but because the data cannot be written, the system is clearly not available and needs to be blocked until the network connection is restored.

In reality, the description of the CAP theorem's three choices of two guarantees is somewhat misleading. As seen in the example, it is not always necessary to choose only two guarantees at all times. When a network failure or partition occurs, the choice is made between C and A, but no one can guarantee that the network will always be unimpeded, so strategies must be devised for such situations in order to achieve a balance between C and A. In 2012, author Eric Brewer also published a paper explaining that in practice, the CAP theorem only prohibits the perfect availability and consistency of the design space in the presence of partitions. In reality, the balance between C and A is very flexible, and CRDT is a good example of this. The CRDT algorithm, under the premise of satisfying Partition Tolerance, strives to ensure Consistency and Availability as much as possible, and the OT collaborative algorithm also achieves this balance by ensuring final consistency.

In a collaborative editing scenario, it seems that we are able to simultaneously satisfy the three scenario limits of CAP. Assuming that in the case of poor network, two clients submit simultaneously, although temporary consistency cannot be maintained and local clients will see different content, after the network recovers, the collaborative algorithm ensures data consistency. This not only satisfies consistency but also ensures availability. This idea is correct, but the consistency we guarantee here is not the same as the consistency in the CAP theory. The CAP theory assumes strong consistency in the absence of network delays, meaning the data is always consistent. In the collaborative editing scenario, the consistency is eventual consistency. As mentioned earlier, the trade-off design between C and A is very flexible. Since strong consistency Strong consistency cannot be achieved, applications can adopt appropriate ways based on their own business characteristics to achieve eventual consistency Eventual consistency. This brings us to the need to talk about the BASE theory.

  • BA: Basically Avaliable, allows for partial availability loss, sacrificing response time, degrading system function, etc.
  • S: Soft State, allows for the existence of intermediate data states, does not require strong consistency, and does not affect overall availability, allowing data synchronization delays between replicas.
  • E: Eventually consistency, where all replicas can eventually reach a consistent state, but real-time strong consistency is not required.

In the BASE theory, we can see that in the scenario of collaborative document editing, although strong consistency of CAP cannot be met, by flexibly designing C and A, sacrificing partial availability, allowing temporary client inconsistencies, and resolving data inconsistencies through collaborative editing conflict algorithms, eventual data consistency is achieved. In fact, there is a lot of research in distributed theory, with strong consistency, weak consistency, sequential consistency, eventual consistency, causal consistency, and so on. Eventual consistency also has divisions such as read-write consistency, write-read consistency, monotonic read consistency, and monotonic write consistency, which will not be further elaborated here.

Next, let's briefly give an example of how to implement an eventually consistent distributed system. Let's assume we have an account T with an initial balance of 100 dollars. Users can access account T through several nodes in the system, such as A, B, and C, and can make operations on T at any time. At this point, we do not have a central server to store the 100 dollars, but each account independently stores this number. In this distributed system, we do not consider security aspects, just the need to ensure that everyone's data is eventually consistent.

Let's say at time t1, A deposited 10 dollars into T, B withdrew 10 dollars from T, and C then queried how much money T had. These operations all happened simultaneously. At this moment:

  • In the A system, T seems to have 110 dollars.
  • In the B system, T seems to have 90 dollars.
  • In the C system, T seems to have 100 dollars.

At time t1, all these perspectives are correct, as we expect in an eventually consistent system that temporary inconsistencies can exist. After a period of time, let's say at t2, we need to make sure that A, B, and C all see T as having 100 dollars, achieving eventual consistency. This will require some operations in between, such as exchanging necessary information and data between the systems. Here comes the challenge: how to perform data exchange? If each node only stores the balance of T as an integer variable, it will not work because when the data from A and B is transferred to C, C won't be able to discern which result from A or B is correct. Thus, the key is how to handle this problem. The simplest solution is to add additional data to ensure the correctness of merging, which is the focus of CRDT`.

In this example, we can design each system to store not a final value, but a series of records containing the time and balance. Let's say our system starts at time t0, in our example, the data stored at time t1 will be as follows:

  • In the A system, it stores (t0, 100), (t1, 110).
  • In the B system, it stores (t0, 100), (t1, 90).
  • In the C system, it stores (t0, 100), (t1, 100).

At time t2, we perform data transmission. C received the operation from A to T from t0 to t1 with +10, and received the operation from B to T from t0 to t1 with -10, so at t2, C still sees 100. For A, it received the operation from B to T from t0 to t1 with -10, and as C did not have an update operation, it does not need to synchronize. For B, it received the operation from A to T from t0 to t1 with +10, and as C did not have an update operation, it does not need to synchronize. Ultimately, A, B, and C all have a consistent total sum of 100, achieving eventual consistency. Of course, this example is simple and the solution is rough. In practical applications, we must consider more data types and application scenarios. Therefore, designing a data structure that can maintain eventual consistency becomes a challenging task. This is why focusing on the theory of CRDT is essential in designing distributed systems, reducing the cognitive burden of distributed collaborative design.

Returning to collaboration, before understanding the CRDT collaborative algorithm, we can also understand the main differences between the CRDT collaborative algorithm and the OT collaborative algorithm. Both CRDT and OT provide eventual consistency, which is the ultimate goal of collaborative editing, but they achieve this goal in different ways:

  • CRDT (Conflict-free Replicated Data Type) achieves this through data structures. It has two implementation approaches: Convergent Replicated Data Types (CvRDT) based on state, and Commutative Replicated Data Types (CmRDT) based on operations. CvRDT merges the various copies, and the number of merges and the order in which they occur are not important; all copies will converge. CmRDT, on the other hand, has commutative operations, so these operations can be correctly applied without the need for transformation.

  • CRDT is more suitable for distributed systems and may not require a central server.

  • CRDT ensures conflict-free editing through data structures, but it increases space complexity.

  • Operational Transformation (OT) achieves this through the transformation of operations (O) to transformations (T). The operations conducted on the terminal (O) are transmitted over the network, and other terminals receiving operation (O) need to undergo transformation (T) before they can be applied to the document. The most basic OT involves transforming index positions to ensure convergence.

  • OT typically requires a central server for coordination.

  • OT addresses the issue of editing conflicts through algorithms, but it increases time complexity.

Basic Principles

As we discussed the CAP theorem earlier, we need to make trade-offs between consistency and availability. Temporarily sacrificing some availability to achieve eventual consistency is completely acceptable. However, designing such a distributed system in practical scenarios is still very complex. CRDT is a theoretical approach towards achieving eventual consistency which can guide us in designing a highly available distributed system. CRDT is not limited to the collaborative domain; various distributed systems and applications are now beginning to explore CRDT. Redis Labs and Riak have already implemented multiple data structures, and Microsoft's CosmosDB uses CRDT as a multi-region consistency solution in Azure.

So, what does CRDT actually represent? How does it merge without conflicts? In distributed systems, there are two ways to implement replication. One is to replicate the full state between the primary node and the secondary node, and the other is to replicate the operations themselves. Put simply, full state replication is akin to copying the entire document being edited and synchronizing it with other clients, while replicating operations is similar to only copying the "Op" caused by the current edit to other clients. When it comes to replicating the state, which is the full state replication, there needs to be some convergence rules. Thus, we can create Convergent Replicated Data Types, or CvRDT, which is a state-based convergent replicated data type, also known as a State-based CRDT. If replicating operations, the operations need to be designed as Commutative, and operation transformation is not required. As a result, we can create Commutative Replicated Data Types, or CmRDT, which is an operation-based commutative replicated data type, also known as Op-based CRDT. In both cases, the goal is to ensure that operations do not conflict with each other and can occur independently to avoid coordination, so they can be collectively referred to as Conflict-free Replicated Data Types, or CRDT.

State-based CRDT

State-based CRDT may sound intimidating at first, but it's not as difficult to understand when broken down. In state-based CRDT, nodes transmit states, and there is a function Merge: (SA, SB) => SC. Upon receiving the transmitted state, it merges with its stored state. This function must satisfy commutative, associative, and idempotent laws. So, from the perspective of abstract algebra, it makes the entire state system form a lattice. As long as new states from all other nodes can be received, the system will always converge, thus avoiding conflicts. This also eliminates the need for underlying communication mechanisms in CmRDT. However, because the transmitted data is the state itself, it may be a bit larger eventually, although there are optimization strategies available.

The previous paragraph may be a bit challenging to understand. Let's break it down slowly. First, let's look at the function Merge: (SA, SB) => SC. This means that the content of the synchronized state needs to be merged, which is the main purpose of state-based CRDT, i.e., merging with the received state with its stored state. Next, we need to consider the conditions that this function must satisfy, namely the commutative, associative, and idempotent laws:

  • Commutative: A ∪ B = B ∪ A.
  • Associative: (A ∪ B) ∪ C = A U (B U C).
  • Idempotent: A U A = A.

When we look at these three formulas, we can easily understand why this operator needs to satisfy these three laws. The first point we need to focus on is that the network is unreliable, and distributed systems must rely on the network for data transmission. So, let's see why it is necessary to ensure these three formulas:

  • Commutative law. Suppose we have two users, A and B, transmitting data to each other. When A synchronizes data to B, it is A ∪ B, and when B synchronizes data to A, it is B ∪ A. Therefore, to ensure eventual consistency, we must ensure the commutative law to correctly merge the data, making the order of application irrelevant.

  • Associative law. If we have five users A, B, C, D, and E transmitting data to each other, and A, B, and C make changes to the data, D and E need to synchronize data from the other three clients. Due to network delays, the order in which clients arrive at D and E may differ, so we need to satisfy the associative law to ensure the correct data merge, making the grouping irrelevant.

  • Idempotent law. If two users, A and B, are transmitting data to each other, and after A makes changes, the replica is transmitted over the network with no ACK confirmation for some time, there may be a timeout mechanism that leads to retransmission of A. If the original A packet was just delayed and the new A packet arrives due to a different route, both identical A packets may reach B at the same time. In this scenario, to ensure correct data merging, we need to guarantee idempotence, making duplicates irrelevant.

It can be seen that satisfying these three formulas is due to the unreliability of the network, and we need to design data structures to ensure eventual consistency. Additionally, it is important to note that if we can ensure the semantics of "Exactly once," then the idempotent law condition can be relaxed. For example, addition satisfies the commutative and associative laws but is not idempotent. Yet, if we can guarantee that the addition operation is only replicated once, the result is still eventually consistent.

In practice, by ensuring the above three theorems, we can synchronize data in distributed systems without considering the order of merging and the problem of multiple mergers, while ensuring eventual consistency.

Now, let's take a look at the concept of a semilattice. Before understanding the semilattice, we need to discuss the concept of a partially ordered set. After understanding the semilattice, we can roughly understand how CRDT can merge without conflicts. Suppose P is a set, and the binary relation on P satisfies the following three conditions; then, is called a partial order relation on P:

  • Reflexivity: ∀a∈P, a≤a.
  • Antisymmetry: ∀a, b∈P, if a≤b and b≤a, then a=b.
  • Transitivity: ∀a, b, c∈P, if a≤b and b≤c, then a≤c.

It is important to note that here does not necessarily refer to the standard mathematical meaning of less than or equal to. We usually consider it as "x precedes y," which means that this partial order is not necessarily about comparing sizes and does not require the elements of the set to be numbers. The key is how to define this binary relation, as long as it satisfies the above three properties, it is a partial order relation. It may sound a bit abstract, so let's take an example. The set of natural numbers has its natural order, that is, the less than or equal to relation, which makes it a total order. We can also define a relation as a subset, meaning we can compare sets through subsets, thus constructing a partial order relation. In the following example, we can see the partial order relation using subsets:

[The diagram of the subsets]

Next, let's look at the concept related to the lattice. A lattice is a partially ordered set with different tops (least upper bound) and bottoms (greatest lower bound). A semilattice is also a partially ordered set, and every non-empty set has a supremum or infimum, where the supremum is called "Least upper bound" or LUB. This means that compared to a lattice, a semilattice has only one different top or bottom, hence the name "semilattice." A join semilattice is a semilattice with different tops (least upper bound), while a meet semilattice is a semilattice with different bottoms (greatest lower bound). Any data structure represented as a semilattice can be implemented to ensure convergent replication. For example, computing the Max() of a set of values will always return the same result, regardless of the order in which the values are received, as long as all values are received in the end, because the Max() operation satisfies the commutative, associative, and idempotent laws. Referring back to the example of the set, if we change the definition, there are two types of cells: one defining the Union(items) lattice for the set and the other defining the Max(value) lattice for the data. For two semilattices, we ensure a maximum upper bound. In the example, the eventually consistent data for the three sets is {x, y, z} 3.

[The diagram of the semilattice]

After going through semilattices and partially ordered sets, it seems like we've circled back to commutativity, associativity, and idempotence. In fact, as mentioned in the semilattice section, as long as we can define a meaningful least upper bound (LUB) function, we can create a CRDT. If the Max(x, y) mentioned above doesn't seem intuitive, we'll provide more examples of data structures in this section to explain the issue. Perhaps in the data structure of rich text, there isn't such a straightforward way to satisfy the semilattice condition with Max(x, y), but don't forget that we can attach additional information to the data. As mentioned earlier, CRDT ensures conflict-free editing through data structure, which increases space complexity. An important point in data structure design is to carry additional information, such as timestamps, logical timestamps, unique user IDs, etc. By attaching metadata, we can ensure that updates remain monotonic, thereby forming a semilattice with a least upper bound.

Op-based CRDT

The operation-based CRDT is actually similar to the state-based CRDT mentioned above, but instead of working on states, it focuses on replicating and synchronizing operations (Op). Hence, we also need to design operations to adhere to commutativity, associativity, and idempotence, as the objects being transmitted from states to operations need to pass through the network, inevitably encountering the issues that distributed systems must solve. Therefore, the design of transmitted operations is crucial, and using op-based CRDT can significantly reduce the data that needs to be transmitted during synchronization.

If the update operations of the local client itself satisfy the three laws mentioned above, then synchronizing replicated operations to other clients only requires replaying the operations that have been transmitted. The simplest example is taking the union of sets. If the local client's operations do not satisfy the mentioned conditions, then consider attaching metadata to the operations (Op) in order to meet the three laws. Similarly, if we can ensure the Exactly once semantic, the idempotence condition can be relaxed. If it still cannot be satisfied, we can consider synchronizing replica data using State-based CRDT, while attaching additional metadata to ensure that local updates and the merging of operations from other clients satisfy the three laws. Interestingly, using an operation-based approach always allows us to simulate a state-based approach.

As the objects of operations are Op, we need to restate how we can satisfy commutativity, associativity, and idempotence based on snapshots. Similar to OT, we need to define some concepts. snapshot is the content of the current document, and apply(snapshot, op) is the application of Op based on snapshot to update the document. The representation of the three laws is as follows:

  • Commutativity: apply(apply(snapshot, A), B) = apply(apply(snapshot, B), A).
  • Associativity: apply(apply(apply(snapshot, A), B), C)= apply(apply(apply(snapshot, B), C), A).
  • Idempotence: apply(apply(snapshot, A), A) = apply(snapshot, A).

In the design of Op-based CRDT, we also need to consider causal consistency. Prior to applying Op, we need to check the snapshot to ensure that the causal ordering required by Op is satisfied. For example, the operation of deleting node x depends on the creation operation of node x, otherwise, the operation cannot be applied, and in the case of unsatisfied conditions, it needs to be blocked and delayed. This also means that we are responsible for ensuring that the preceding states of each operation can be satisfied. In other words, we need to guarantee that the preceding conditions of each operation can be satisfied by applying them in causal order, so that all update functions will terminate. Ultimately, the replicas of each client pass each other the missing Op to achieve eventual consistency through the apply operation.

Similarly, even when synchronizing Op, we can ensure the implementation of the three laws by guaranteeing a partial order. We can determine the partial order based on the execution sequence of each Op on the replicas, meaning the additional metadata attached is timestamp or logical timestamp.

  • If an OpA is executed before OpB on a replica, then A < B.
  • If neither A < B nor B < A, then we can say that A and B are parallel operations.

Therefore, in the design of Op-based CRDT, we need to ensure that Op satisfies commutativity, associativity, and idempotence, and ensure that the preceding conditions of each operation can be satisfied by applying them in causal order so that all update functions will terminate, thus obtaining a data structure that satisfies Op-based CRDT. In fact, formally, Op-based CRDT and State-based CRDT can be converted into each other, so most of the contents that need to be understood in the Op-based CRDT section are also reflected in the State-based CRDT section. In the latter section, we mentioned attaching metadata to rich text to meet the conditions of CRDT. Similarly, we can also attach metadata to Op to meet the conditions. In simpler terms, if we assign a unique id to each word in rich text, combined with logical timestamps, we can precisely know how to apply Op to the document's location without the need to transform indices. Of course, this implementation is very rudimentary. For a deeper understanding, one can look into the YATA algorithm and Yjs. In Yjs, most operations are designed and implemented based on Op-based CRDT, while the handling of delete operations is similar to the implementation of State-based CRDT.

Data Structure

Below are some typical examples of CRDT data structures. These examples can help us understand the design principles of CRDT. It's important to note that the examples here do not involve central server scheduling or data storage, as the data is synchronized between various clients.

Op-based Counter

The Op-based Counter represents an integer counter. Let's say we have A and B as two clients maintaining a counter together. Both clients store an integer, and if the current value is 10, and then A performs an increment operation, while B performs a decrement operation, we may temporarily have the value of A as 11 and B as 9. Later, data synchronization is necessary to ensure that A and B achieve the ultimately consistent value of 10. We can achieve this by synchronizing the increment operation of A and the decrement operation of B in the network, which ensures that both A and B will have the value of 10.

It seems like this operation is easy to implement because addition naturally satisfies the commutative and associative properties, and subtraction can also be considered as addition, but with a negative value. However, it's important to note that in this example, addition is not idempotent, so it is crucial to ensure no loss or duplication during the synchronization process.

Grow-only Counter

The State-based Counter is not so obvious in its organizational form, let's start by looking at the Grow-only Counter, which means a counter that only increments. The key concept here is that we are implementing a CRDT that synchronizes data globally. Since the synchronization is done globally, if each replica accumulates independently, it becomes impossible to determine the exact cumulative value for each replica during the merge process. It is not sufficient to simply take the max as the final result during synchronization. For example, if we have A and B as two clients maintaining a counter, both initially starting with 0, and then A increases by 1 and B increases by 2, taking max(1, 2) won't work because what we actually want is the data to be 3. Even though the data structure designed through max satisfies the commutative, associative, and idempotent properties, it contradicts our objective of getting a counter, rather than obtaining the maximum value of the clients.

A feasible solution is to use an array on each replica to store the values of the other replicas. Local updates should only operate on the corresponding item in the array, and the merge operation should only modify the other items in the array, compute the max of each item for merging. When querying, the local values from each replica should be summed to obtain the Counter we desire. For example, if A and B are two clients maintaining a counter, both initially starting with 0, then A's data becomes [0, 0] and B's becomes [0, 0]. If A increases by 1 and B increases by 2, A becomes [1, 0] and B becomes [0, 2]. After data synchronization, A becomes [1, 2] and B becomes [1, 2], thus making the data consistent between both clients, with the final count being 3.

In the above approach, obtaining the max of each item in the array during merging is how we maintain the commutative, associative, and idempotent properties. Due to the unreliable nature of networks, we cannot guarantee that data packets will always be synchronized correctly. If, for instance, a historical [1, 1] package is still in transit in the network and arrives at a client when the client's value has already reached [6, 10], then this package must be discarded. Otherwise, we won't uphold the three properties we need to ensure. Thus, we maintain a partial order relationship, although natural numbers are actually a total order relationship. Of course, we can also attach a timestamp or logical timestamp to the data, as it also accomplishes the same objective by adding additional information.

Positive-Negative Counter

In the final segment of the Grow-only Counter, we presented an example where a historical [1, 1] package was still in transit in the network and arrived at a client when the client's value had already reached [6, 10]. This example shows that the system maintains a structure that is solely for incrementing data. If we add a negative number, or simply call it a subtraction operation, then the system described above becomes ineffective.

Therefore, a State-based CRDT with a Decrement is not as straightforward as the G-Counter. After introducing subtraction, it no longer satisfies the monotonically increasing partial order relationship during updates. Hence, a feasible approach here is to construct two G-Counters, one for storing the accumulated values of Increment, and the other for storing the accumulated values of Decrement. This achieves our goal of maintaining the three properties through two sets of CRDT with a partial order relationship.

Grow-only Set

The Grow-only Set represents a set that only allows addition. The Add operation of the Set is essentially a union and naturally satisfies the commutative, associative, and idempotent properties. Therefore, we can easily use an Op-based CRDT, similar to the Op-based Counter. In this case, there is no need to ensure no loss or duplication as when dealing with the Op-based Counter, because taking the union of sets is idempotent. Of course, we can also use a State-based CRDT for a full-data union, in which case each client's replica only needs to store a single copy of the set.

Two-Phase Set

In the Two-Phase Set, when we consider deletion operations, the implementation approach is similar to the PN-Counter. We use two G-Sets, where Set A is responsible for addition, and an element removed from Set A is not actually deleted but copied to Set R. During queries, if an element is present in Set A and absent in Set R, then it indicates that the element is still there. In practice, the 2P-Set is not very practical, as once an element is removed, it cannot be added back, and the complexity of maintaining the deleted elements in the original Set is also significant and wasteful of space.

Last-Write-Wins-Element Set

In the Last-Write-Wins-Element Set, to solve the problem of not being able to re-add deleted elements, you can consider adding an update timestamp to each element in sets A and R of the 2P-Set, while keeping other operations unchanged. Therefore, when querying, it is necessary to verify the existence of the element. If an element is in the adding set A and not in the removing set R, or if it is in the removing set R but with a timestamp earlier than the latest timestamp in the adding set A, then the element is considered to exist. Another more optimized implementation is to eliminate the need for the R set, and instead, have each element in the A set maintain a deletion flag in addition to an update timestamp. This demonstrates that there are various implementations through attaching metadata as long as they satisfy the commutative, associative, and idempotent laws.

Observed-Remove Set

The Observed-Remove Set is similar to the implementation of the LWW-Element-Set, but it uses a unique label tag/uuid instead of a timestamp. We still need to maintain an add set Set A and a remove set Set R. When adding an element, a new unique tag tag/uuid is generated, and when deleting, the element and tag are copied to the remove set Set R. During a query, if an element is in Set A and not in Set R, then the element exists in the collection. Since we generate globally unique tag, there is no need to compare timestamps, ensuring that deleted elements can be added again. This can be viewed as an implementation of the State-based CRDT.

Another design that satisfies the Op-based CRDT involves adding a unique tag/uuid to each element every time Add(e) is performed, while Remove(e) will delete all occurrences of e and their corresponding tag/uuid on the current node. Therefore, even if there are concurrent Add(e) operations while Remove(e) is also taking place, e can eventually be successfully added. This semantic is known as Add wins. Next, let's see how this approach satisfies the commutative, associative, and idempotent laws. The descriptions here are all based on ensuring casual consistency, first, regarding commutative law, since tag/uuid is globally unique, it is obvious that both Add and Remove operations can be exchanged, provided that casual consistency is met, otherwise, blocking and waiting are necessary. Similarly, the associative law is satisfied, allowing for arbitrary combination operations on sets under casual consistency, mainly because tag/uuid is globally unique and inconsistency issues will not arise. Lastly, the idempotent law needs to be ensured. If the tag/uuid of a resent package due to an unreliable network matches the data before the retry, and since the element already exists in reality, it will not be added repeatedly, making this operation idempotent. In fact, similar to the initial mention of the Op-based Counter, Op is commutable and ensures idempotence, making this a reasonable CRDT design.

Finally

As CRDT is a general solution for handling distributed system data synchronization issues, this article does not limit itself to the design of rich text data structures, but rather seeks to understand CRDT from the perspective of distributed data synchronization, with intermittent applications of CRDT in the rich text domain, thereby allowing for a better understanding of this data model. Similarly, the content introduced in this article is only the tip of the iceberg. Data synchronization in distributed systems has always been a complex problem. Returning to the rich text domain, issues such as ensuring collaborative editor performance, making trade-off strategies under the CAP theory, ensuring data stability, recoverability, and traceability, handling cursor synchronization, and managing Undo/Redo, all require in-depth research and design.

In the rich text domain, the current mainstream solution for online documents is still OT. In an article "CRDTs are the future" by the author of ShareDB in September 2020, the trade-offs between CRDT and OT were discussed. The biggest issue with OT is that it must rely on a central server. Therefore, seamless data sharing between all devices depends on the server's data scheduling. On the other hand, using CRDT allows for data synchronization without the need for a central server, making it a powerful means of solving consistency issues in the future. Similar to OT, there are many open-source implementations in the field of CRDT. By applying the foundational CRDT library, applications can naturally acquire support for concurrent updates in multi-user collaboration scenarios with only a simple data Model layer and an API almost identical to Backbone. Here, two recommended CRDT implementations are automerge (https://github.com/automerge/automerge/) and yjs (https://github.com/yjs/yjs/).

Daily Question

https://github.com/WindrunnerMax/EveryDay

References

https://zhuanlan.zhihu.com/p/50990721
https://zhuanlan.zhihu.com/p/424467723
https://zhuanlan.zhihu.com/p/452980520
https://zhuanlan.zhihu.com/p/510797688
https://zhuanlan.zhihu.com/p/425265438
http://www.alloyteam.com/2020/01/14221/
https://www.jdon.com/artichect/crdt.html
https://www.zhihu.com/question/507425610
https://juejin.cn/post/6844903672032264199
https://juejin.cn/post/7049939780477386759
https://josephg.com/blog/crdts-are-the-future/
https://www.zxch3n.com/crdt-intro/design-crdt/
https://my.oschina.net/fileoptions/blog/1819610
https://hal.inria.fr/file/index/docid/555588/filename/techreport.pdf