Enhance Benchmark Dataset by Adding Diverse Editing Traces

**What would you like to be added**:
Currently, the benchmark dataset for evaluating collaborative text editing algorithms is limited to a single dataset: editing-trace.json. To address this limitation, I propose enhancing the benchmark suite by including additional CRDT-related datasets, such as those used in the [eg-walker](https://arxiv.org/abs/2409.14252) paper (https://github.com/josephg/editing-traces) and from the [json-crdt-traces](https://github.com/streamich/json-crdt-traces) project. These datasets cover a variety of editing scenarios that would be valuable for testing and validating our algorithm’s performance.
Specifically, the eg-walker paper’s datasets include:
* Sequential Traces: No concurrency.
* Concurrent Traces: Two users collaborating on writing tasks, with artificial latency to simulate real-time concurrency (C1, C2).
* Asynchronous Traces: Editing traces reconstructed from Git repositories (A1, A2), mirroring the branching and merging behavior typical in version control.

**Why is this needed**:
Currently, we rely on just one dataset (editing-trace.json with about 250,000 editing operations), which makes it difficult to evaluate whether our algorithm generalizes well or if it’s overfitting to a single dataset. Incorporating additional editing traces will help us better assess the algorithm’s robustness across a wider range of real-world editing scenarios and ensure it isn’t narrowly optimized for a single dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhance Benchmark Dataset by Adding Diverse Editing Traces #1289

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Enhance Benchmark Dataset by Adding Diverse Editing Traces #1289

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions