record: experiment with moving WAL chunk CRC computation to the flush goroutine #4431

petermattis · 2025-03-27T12:18:59Z

Currently, the CRC for each WAL chunk (a.k.a fragment) is computed as the chunk is emitted to a WAL block. This CRC computation is done while holding commitPipeline.mu which periodically shows up in mutex profiles. Prior profiling indicates that the CRC computation is ~1/3 of the CPU during commitPipeline.prepare. We could move the CRC computation out of commitPipeline.mu by not performing it during LogWriter.emitFragment* and instead perform it somewhere within LogWriter.flushLoop. Before writing a WAL block, or partial WAL block, to the WAL file we'd iterate over the fragments being written and populate the CRC. This iteration is straightforward as the fragments are tightly packed in the WAL block and are self-describing:

+----------+-----------+-----------+----------------+--- ... ---+----------+-----+
| CRC (4B) | Size (2B) | Type (1B) | Log number (4B)| Payload   | CRC (4B) | ... |
+----------+-----------+-----------+----------------+--- ... ---+----------+-----+

It isn't clear that this refactoring will be a win as performing the CRC computation on the flush goroutine will compete with time spent performing I/O. A quick experiment to see if this may be worthwhile would be to benchmark disabling the CRC computation in LogWriter.emitFragment*.

Jira issue: PEBBLE-368

The text was updated successfully, but these errors were encountered:

petermattis · 2025-03-27T12:34:04Z

Quite a bit more radically than the above idea, we could move the memcpy of the batch representation out from under commitPipeline.mu. Once we know the offset within the WAL where a batch will be written, the number of fragments the batch will be broken into is deterministic based on the batch size. And the location of the next batch in the WAL can also be calculated deterministically. The sketch of what we could do is to have db.commitWrite return the offset of where the batch will be written to the WAL, release commitPipeline.mu, and then call back into LogWriter to actually stage the batch into fragments. If I squint this also seems possible, though quite complicated. We'd need to have the flush loop wait until the prefix of the WAL block it is trying to flush has been fully staged. The synchronization to make this work could be a non-starter.

petermattis · 2025-03-28T21:17:16Z

I attempted the quick experiment to disable the CRC computation and couldn't measure a perf difference on the kv0/enc=false/nodes=1/cpu=32 roachtest across 10 runs with and without this change. I did verify that the disabling was done correctly by looking at a CPU profile. I chose this roachtest out of a guess that it would be the most sensitive to a change in this area. I also tried completely disabling the memcpy of the batch into the WAL (so effectively we're writing zeroes for the WAL). Again, no measurable perf difference.

Certainly possible I did something wrong in this testing, but for now the TLDR is "nothing to see here".

petermattis · 2025-04-07T15:20:41Z

Scalability of write-ahead logging on multicore/multisocket hardware. Section 5 (Scalable log buffer design for multicore) is interesting as the “baseline” design is more or less what Pebble/RocksDB are doing: a single mutex protects the entirety of the addition of a log record to the WAL. The solution is something was wondering about above: reserving space in the WAL buffer, allowing writes to memcpy to the buffer in parallel, and then releasing the buffer reservations in the same order they were acquired.

blathers-crl bot added A-storage T-storage labels Mar 27, 2025

petermattis added the C-performance label Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record: experiment with moving WAL chunk CRC computation to the flush goroutine #4431

record: experiment with moving WAL chunk CRC computation to the flush goroutine #4431

petermattis commented Mar 27, 2025 •

edited by cockroach-jira-scripts

Loading

petermattis commented Mar 27, 2025

petermattis commented Mar 28, 2025

petermattis commented Apr 7, 2025

record: experiment with moving WAL chunk CRC computation to the flush goroutine #4431

record: experiment with moving WAL chunk CRC computation to the flush goroutine #4431

Comments

petermattis commented Mar 27, 2025 • edited by cockroach-jira-scripts Loading

petermattis commented Mar 27, 2025

petermattis commented Mar 28, 2025

petermattis commented Apr 7, 2025

petermattis commented Mar 27, 2025 •

edited by cockroach-jira-scripts

Loading