Improve checksum calculations #13989

jfboeuf · 2024-11-12T08:51:37Z

Take advantage of the existing buffer in BufferedChecksum to speed up reads for Longs, Ints, Shorts, and Long arrays by avoiding byte-by-byte reads.
Use the faster readLongs() method to decode live docs and bloom filters.

rmuir · 2024-11-12T09:48:35Z

This is actually slower, we only want to call updateBytes(byte[]) or the checksum calculation is very slow (not vectorized).

jfboeuf · 2024-11-12T14:17:23Z

Thank you for your feedback. Perhaps I misunderstood your point, but the implementation I propose only calls Checksum.update(byte[]). The change resides in how the buffer is fed to avoid byte-by-byte reading from the underlying IndexInput and byte-by-byte copy to the buffer. I created and pushed into a different branch a JMH benchmark comparing the current implementation to the one I propose. The results show a noticeable improvement:

Benchmark                                                  (size)   Mode  Cnt     Score     Error   Units
BufferedChecksumIndexInputBenchmark.decodeLongArray            10  thrpt   15  7521.843 ± 167.756  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArray          1000  thrpt   15  1004.213 ±   3.587  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArray        100000  thrpt   15    10.993 ±   0.102  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig        10  thrpt   15  3865.018 ±  38.850  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig      1000  thrpt   15    46.381 ±   0.293  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig    100000  thrpt   15     0.475 ±   0.022  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs          10  thrpt   15  8355.638 ±  52.745  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs        1000  thrpt   15   212.133 ±   4.296  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs      100000  thrpt   15     2.744 ±   0.021  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig      10  thrpt   15  3938.857 ±  42.751  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig    1000  thrpt   15    47.246 ±   0.444  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig  100000  thrpt   15     0.460 ±   0.020  ops/ms

For large arrays, there can be an improvement of up to 23 times when reading long arrays and 6 times when reading single long values. Transitioning from reading single long values to long arrays for live documents and Bloom Filters— bitsets being commonly large in both scenarios—results in even greater performance enhancements.

The benchmark shows the single-long approach performs better on small arrays. This is likely due to the cost of wrapping the byte[] to a ByteBuffer and LongBuffer that is not paid off to copy a few bytes. It can be improved by making updateLongs(long[], int, int) switch to a loop over updateLong(long) when the length to checksum fits in the buffer. What do you think?

jpountz · 2024-11-12T15:18:25Z

The change makes sense to me and looks like it could speed up loading live docs (and thus speed up opening readers).

The benchmark shows the single-long approach performs better on small arrays. [...] It can be improved by making updateLongs(long[], int, int) switch to a loop over updateLong(long) when the length to checksum fits in the buffer.

Something like this would make sense to me, let's make it look as similar as possible to update(byte[], int, int)?

rmuir · 2024-11-12T17:20:03Z

OK, I see @jfboeuf, thank you for the explanation. My only concern with with the optimization is testing. If there is a bug here, the user will get CorruptIndexException.

Could we add some low-level tests somehow?

Especially the readLongs() seems like potential trouble, as I am not sure anything tests that it works correctly with a non-zero offset.

jfboeuf · 2024-11-12T17:30:41Z

@jpountz
I modified the benchmark to make it more realistic by adding a header to the IndexOutput so:

updateLongs(long[], int, int) triggers a real flush before processing (the benchmark was biased because the buffer was empty, which is unlikely in real operations where there is at least the header and the table size before).
write accesses to the buffer by updateLong(long) are not necessarily aligned (they are always with the new updateLongs(long[], int, int)).

After testing with different sizes of long[] it seems the loop of updateLong(long) is only worth it for really small arrays and this specific case probably doesn't deserve the additional complexity/if_branch in updateLongs(long[], int, int).

Benchmark                                              (size)   Mode  Cnt     Score     Error   Units
BufferedChecksumIndexInputBenchmark.decodeLongArray         8  thrpt   15  5992.000 ± 123.429  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArray        16  thrpt   15  5729.068 ± 123.423  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArray        32  thrpt   15  5307.703 ± 149.885  ops/ms
BufferedChecksumIndexInputBenchmark.decodeLongArray        64  thrpt   15  4960.178 ± 124.810  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs       8  thrpt   15  6182.986 ± 253.181  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs      16  thrpt   15  5472.158 ± 190.560  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs      32  thrpt   15  4739.634 ± 404.963  ops/ms
BufferedChecksumIndexInputBenchmark.decodeSingleLongs      64  thrpt   15  3858.685 ±  81.880  ops/ms

@rmuir I'll add unit tests to thoroughly check the method behaves properly.

jpountz · 2024-11-13T07:57:37Z

lucene/core/src/java/org/apache/lucene/store/BufferedChecksum.java

+  }
+
+  void updateLongs(long[] vals, int offset, int len) {
+    flush();


All other updateXXX() methods only flush if there is no room left or if the data to update is bigger than the buffer size. I'd like to retain this property here (even though your benchmarks suggest that it doesn't hurt performance much).

I pushed the modification to limit the flush. Not only is the buffer no longer systematically flushed entering the updateLongs(long[], int, int) but also the last chunk is not flushed which results beneficial when in the likely case there is remaining data (the codec footer) after the long[] that could fit in the buffer.

jpountz

The main changes look good to me, could you improve tests to better exercise the case when a short/int/long is split across two BufferedChecksum buffers?

jpountz

Only left a minor comment, can you add an entry to lucene/CHANGES.txt under 10.1.0?

jpountz · 2024-11-22T20:42:33Z

lucene/core/src/test/org/apache/lucene/store/TestBufferedChecksum.java

+
+  private void updateByShorts(long expected, BufferedChecksum checksum, byte[] input) {
+    int ix = shiftArray(checksum, input);
+    while (ix < input.length - Short.BYTES) {


I believe you meant <=?

Suggested change

while (ix < input.length - Short.BYTES) {

while (ix <= input.length - Short.BYTES) {

we can indeed do one more loop! Done.

jpountz · 2024-11-22T20:46:06Z

lucene/core/src/test/org/apache/lucene/store/TestBufferedChecksum.java

+      checkChecksumValueAndReset(expected, checksum);
+
+      len0 &= ~(Long.BYTES - 1); // truncates to multiple of Long.BYTES
+      checksum.update(input, 0, len0 * Long.BYTES);


Can we test with a shift for long[] too?

Take advantage of the existing buffer in BufferedChecksum to speed up reads for Longs, Ints, Shorts and Long arrays by avoiding byte-by-byte reads.

jpountz · 2024-11-25T14:59:52Z

Thank you!

Take advantage of the existing buffer in BufferedChecksum to speed up reads for Longs, Ints, Shorts and Long arrays by avoiding byte-by-byte reads.

jpountz reviewed Nov 13, 2024

View reviewed changes

jfboeuf force-pushed the checksum branch from ab14a60 to 3f21cd0 Compare November 13, 2024 17:08

jpountz reviewed Nov 22, 2024

View reviewed changes

jpountz approved these changes Nov 22, 2024

View reviewed changes

jfboeuf force-pushed the checksum branch from 51b93e4 to 6c2246c Compare November 23, 2024 09:17

Improve checksum calculations

aafed3e

Take advantage of the existing buffer in BufferedChecksum to speed up reads for Longs, Ints, Shorts and Long arrays by avoiding byte-by-byte reads.

jfboeuf force-pushed the checksum branch from 6c2246c to aafed3e Compare November 23, 2024 10:08

jpountz merged commit 534f815 into apache:main Nov 25, 2024
3 checks passed

jpountz modified the milestones: 10.0.1, 10.1.0 Nov 25, 2024

jpountz pushed a commit that referenced this pull request Nov 25, 2024

Improve checksum calculations (#13989)

71715b5

Take advantage of the existing buffer in BufferedChecksum to speed up reads for Longs, Ints, Shorts and Long arrays by avoiding byte-by-byte reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve checksum calculations #13989

Improve checksum calculations #13989

Uh oh!

jfboeuf commented Nov 12, 2024

Uh oh!

rmuir commented Nov 12, 2024

Uh oh!

jfboeuf commented Nov 12, 2024 •

edited

Loading

Uh oh!

jpountz commented Nov 12, 2024 •

edited

Loading

Uh oh!

rmuir commented Nov 12, 2024

Uh oh!

jfboeuf commented Nov 12, 2024

Uh oh!

jpountz Nov 13, 2024

Uh oh!

jfboeuf Nov 13, 2024

Uh oh!

jpountz left a comment

Uh oh!

jpountz left a comment

Uh oh!

jpountz Nov 22, 2024

Uh oh!

jfboeuf Nov 23, 2024

Uh oh!

jpountz Nov 22, 2024

Uh oh!

jfboeuf Nov 23, 2024

Uh oh!

Uh oh!

jpountz commented Nov 25, 2024

Uh oh!

Uh oh!

	while (ix < input.length - Short.BYTES) {
	while (ix <= input.length - Short.BYTES) {

Improve checksum calculations #13989

Improve checksum calculations #13989

Uh oh!

Conversation

jfboeuf commented Nov 12, 2024

Uh oh!

rmuir commented Nov 12, 2024

Uh oh!

jfboeuf commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmuir commented Nov 12, 2024

Uh oh!

jfboeuf commented Nov 12, 2024

Uh oh!

jpountz Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

jfboeuf Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

jfboeuf Nov 23, 2024

Choose a reason for hiding this comment

Uh oh!

jpountz Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

jfboeuf Nov 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpountz commented Nov 25, 2024

Uh oh!

Uh oh!

jfboeuf commented Nov 12, 2024 •

edited

Loading

jpountz commented Nov 12, 2024 •

edited

Loading