-
-
Notifications
You must be signed in to change notification settings - Fork 170
Description
What would you like to be added:
Currently, the benchmark dataset for evaluating collaborative text editing algorithms is limited to a single dataset: editing-trace.json. To address this limitation, I propose enhancing the benchmark suite by including additional CRDT-related datasets, such as those used in the eg-walker paper (https://github.com/josephg/editing-traces) and from the json-crdt-traces project. These datasets cover a variety of editing scenarios that would be valuable for testing and validating our algorithm’s performance.
Specifically, the eg-walker paper’s datasets include:
- Sequential Traces: No concurrency.
- Concurrent Traces: Two users collaborating on writing tasks, with artificial latency to simulate real-time concurrency (C1, C2).
- Asynchronous Traces: Editing traces reconstructed from Git repositories (A1, A2), mirroring the branching and merging behavior typical in version control.
Why is this needed:
Currently, we rely on just one dataset (editing-trace.json with about 250,000 editing operations), which makes it difficult to evaluate whether our algorithm generalizes well or if it’s overfitting to a single dataset. Incorporating additional editing traces will help us better assess the algorithm’s robustness across a wider range of real-world editing scenarios and ensure it isn’t narrowly optimized for a single dataset.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status