-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unsafe little endian loaders #1036
base: master
Are you sure you want to change the base?
Conversation
Benchmarks pending
📝 WalkthroughWalkthroughThe pull request introduces a comprehensive set of changes across multiple packages in the compression library, focusing on enhancing byte manipulation and testing strategies. The modifications primarily involve replacing Changes
Sequence DiagramsequenceDiagram
participant Client
participant Decoder
participant BitReader
participant InternalLE
Client->>Decoder: Initiate Decoding
Decoder->>BitReader: Initialize with Input
BitReader->>InternalLE: Load Bytes
InternalLE-->>BitReader: Return Loaded Data
BitReader->>Decoder: Provide Decoded Data
Decoder-->>Client: Return Decoded Result
The sequence diagram illustrates the new byte loading mechanism introduced in the changes, showing how the Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
internal/le/le.go (1)
3-5
: Flexible Indexer interface.This union type is a neat approach for generics. Just confirm you need all these integer variants, since broad coverage might allow usage in unintended ways.
internal/le/unsafe_disabled.go (1)
17-19
: Load64 fallback.Same pattern: length checks avoid potential slicing mistakes.
zstd/bitreader.go (2)
22-22
: Newoff
field introduced.Renaming
off
tobyteOffset
orcursor
might improve clarity.Here’s a potential diff:
type bitReader struct { in []byte value uint64 - off int + byteOffset int bitsRead uint8 }
106-106
: finished() check uses offset and bitsRead.Now that you rely on
off == 0
, confirm that the read buffer has truly been consumed. IfbitsRead < 64
but we haven't setoff
to zero, partial data might remain..github/workflows/go.yml (1)
131-146
: Consider parameterizing common test configurations.The S2 fuzzing tests have repeated patterns with similar configurations. Consider using GitHub Actions matrix strategy to reduce duplication.
Example refactor:
strategy: matrix: fuzz_test: [FuzzDictBlocks, FuzzEncodingBlocks, FuzzLZ4Block] tags: [nounsafe, 'noasm,nounsafe'] steps: - name: S2/${{ matrix.fuzz_test }}${{ matrix.tags == 'noasm,nounsafe' && '/noasm' || '' }} run: go test -tags=${{ matrix.tags }} -run=none -fuzz=${{ matrix.fuzz_test }} -fuzztime=${{ matrix.fuzz_test == 'FuzzDictBlocks' && '100000x' || '500000x' }} -test.fuzzminimizetime=10ms ./s2/.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/go.yml
(4 hunks)flate/fast_encoder.go
(2 hunks)flate/fuzz_test.go
(0 hunks)internal/le/le.go
(1 hunks)internal/le/unsafe_disabled.go
(1 hunks)internal/le/unsafe_enabled.go
(1 hunks)s2/encode_all.go
(1 hunks)zstd/_generate/gen.go
(2 hunks)zstd/bitreader.go
(6 hunks)zstd/decoder.go
(1 hunks)zstd/matchlen_generic.go
(1 hunks)zstd/seqdec.go
(1 hunks)zstd/seqdec_amd64.s
(16 hunks)zstd/seqdec_generic.go
(1 hunks)zstd/zstd.go
(2 hunks)
💤 Files with no reviewable changes (1)
- flate/fuzz_test.go
🔇 Additional comments (60)
internal/le/unsafe_enabled.go (3)
1-4
: Validate build constraints and selective unsafe usage.You're targeting 64-bit LE architectures while explicitly excluding
nounsafe
,purego
, andappengine
. This approach is valid for performance-critical scenarios but ensure these constraints match your broader build strategy (e.g., are you expecting code coverage or usage in non-64-bit or safe builds?).Would you like a script to find all references to these build tags across your repository to confirm consistency?
5-9
: Import statement is minimal and clear.Keeping imports confined to just
unsafe
here is appropriate; ensures you don't accidentally rely onbinary.LittleEndian
when using the unsafe approach.
17-21
: Replicate the same safety check in Load32.Similar to Load16, consider a safeguard if
int(i) + 4 > len(b)
.Would you like me to generate a script to scan for all calls to
Load32
across the codebase to confirm they provide correct indexing?internal/le/le.go (1)
1-2
: Package definition is concise.A straightforward approach for an internal package that provides LE loading/storing functionality.
internal/le/unsafe_disabled.go (7)
1-2
: Build constraints ensure fallback on non-64-bit or no-unsafe builds.This is essential to maintain portability across architectures. Good usage of negative build constraints.
3-4
: Package declaration.Good to keep everything under
le
for consistent naming.
5-7
: Fallback to encoding/binary is valid and stable.Relying on standard library ensures correctness for
unsafe_disabled
scenarios.
9-11
: Load16 fallback.Similar to
unsafe_enabled.go
, consider verifyingi < len(b)
with enough space for 2 bytes. Possibly out-of-scope if all callers guarantee length.
13-15
: Load32 fallback.Check array length if you want robust out-of-bounds handling.
21-23
: Store16 fallback.Ensure
len(b) >= 2
in the event of potential misuse.
25-27
: Store32 fallback.Likewise, ensure
len(b) >= 4
.zstd/matchlen_generic.go (2)
11-12
: Switched from binary.LittleEndian to le package.Great for consistency with the new
le
methods.
19-19
: Use of Load64 for bitwise comparison.This is valid for matching prefix lengths. Just ensure
a
andb
each have at least 8 bytes in these loops.zstd/bitreader.go (5)
12-13
: Transition to internal/le for reading integers.This aligns with the new approach and unifies the code.
37-37
: Initialize offset with len(in).This ensures reading proceeds from the end of the slice. Good usage to align with the bitstream logic.
73-74
: Offset-based read logic in fill methods.You’ve updated each fill function to decrement
off
by 4 or 8 before loading. Ensure no negative offsets are possible. Ifoff < 4
oroff < 8
unexpectedly, we risk an out-of-range read.If you wish, I can provide a script to search the codebase for any direct calls that might lead to insufficient data prior to calling these methods.
Also applies to: 80-81, 90-100
116-116
: remain() reflects offset-based bits.Implementation is correct given the new
off
usage, but if partial bytes are left unfilled or if reads were misaligned, we might want a sturdier approach.
123-123
: Resetoff
upon close.This is helpful to avoid accidental reuse.
zstd/zstd.go (3)
11-12
: Import usage looks good.
Switching to thegithub.com/klauspost/compress/internal/le
package is consistent with the broader refactoring effort and aligns with the goal of using specialized utilities for little-endian operations.
114-114
: Validate performance gains and potential endianness assumptions.
Whilele.Load32
may offer faster reads, confirm consistent usage across big-endian environments or confirm that the library explicitly requires little-endian compatibility.
118-118
: Similar verification for Load64 usage.
Likewise, ensure that all usage ofle.Load64
is tested on platforms or configurations that may not be purely little-endian in hardware.flate/fast_encoder.go (3)
10-11
: Refactor to remove redundant imports.
Removingencoding/binary
in favor of thele
package helps maintain code consistency and possibly improves performance, but confirm that all references tobinary.LittleEndian
have been updated correctly elsewhere.
62-62
: Check for potential pointer alignment issues.
Usingle.Load32
is likely safe, but if there's any pointer arithmetic or alignment concerns, ensure they are properly handled in all build configurations (including withnounsafe
).
66-66
: Perform thorough testing of 64-bit reads.
Ensure automated tests cover boundary conditions (e.g., reading near slice ends).zstd/seqdec_generic.go (1)
32-32
: Confirm logic correctness ofif br.off > 4+((maxOffsetBits+16+16)>>3)
condition.
Changing from a length-based check to an offset-based check may affect certain edge cases in decoding, so validate via robust tests, including boundary conditions.zstd/seqdec.go (1)
248-248
: Ensure consistency in offset-based fast path usage.
Similar toseqdec_generic.go
, confirm that comparingbr.off
instead of buffer length is correct for all decoding scenarios.zstd/decoder.go (1)
326-326
: Ensure concurrency safety when resetting offset
This assignment resets the bit reader's offset to zero. Confirm that no other goroutines rely on the previous offset.s2/encode_all.go (3)
13-14
: Import statement is consistent with project structure
The move to thele
package aligns with the overall refactoring strategy.
18-18
: Validate thatle.Load32
handles boundary conditions
Ensure thatle.Load32
is safe for edge cases (e.g., near end-of-slice).
22-22
: Validate thatle.Load64
handles boundary conditions
Ensure thatle.Load64
is safe for edge cases (e.g., near end-of-slice).zstd/_generate/gen.go (2)
160-160
: Capture newly introduced offset field
Loadingbr.Field("off")
intobrOffset
is an important change. Confirm that all references to the old pointer usage are removed.
441-441
: Properly storing the offset
StoringbrOffset
intobr.Field("off")
ensures the bit reader’s state is consistent upon return.zstd/seqdec_amd64.s (24)
10-10
: Offset-loading ensures updated struct layout
“MOVBQZX 40(CX), BX” suggests a shift in field alignment. Double-check that structures match offsets.
12-12
: Initialize SI from new field
Line 12 loads from offset 32, confirming the new layout. Validate the old references’ removal.
302-303
: Persist updated bit reader fields
“MOVB BL, 40(AX)” and “MOVQ SI, 32(AX)” store the new bit count and pointer offset back.
338-338
: Load bit count from 40(CX)
Reinforces the updated field usage for the bit count.
340-340
: Set offset from 32(CX)
Check that the pointer arithmetic remains correct after the struct changes.
601-602
: Store updated bit count & offset
Lines 601-602 finalize the bit reader’s state. Verify no concurrency hazards exist.
637-637
: Adopt new offset reading
“MOVBQZX 40(BX), DX” reads the offset or bits count from the new field offset. Confirm it matches the structure.
639-639
: Initialize CX from 32(BX)
Similar check for the pointer offset field.
887-888
: Write back updated fields
Storing AX to 24(CX), DL to 40(CX), and BX to 32(CX) shows consistent usage of the changed offsets.
923-923
: Load offset from 40(BX)
Double-check that the bit count references are correct for 64-bit registers.
925-925
: Extract pointer from 32(BX)
Ensure alignment constraints are still valid.
1144-1145
: Amend registers with new offsets
“Saves AX to 24(CX), DL to 40(CX), BX to 32(CX).” This continues the pattern of storing updated offsets.
1790-1790
: Load bit count from 40(CX)
Ensure a correct mask or shift is applied elsewhere to interpret it properly.
1792-1792
: Load offset from 32(CX)
Validate concurrency usage if multiple decoders share the same structure.
2284-2285
: Write updated offset fields
Check that the values put into 24(AX), 40(AX), and 32(AX) do not conflict with any leftover references to old structure member indices.
2352-2352
: Switch to 40(BX) offset
Bit count read from a new place.
2354-2354
: Load pointer from 32(BX)
Double-check pointer addition logic downstream.
2804-2805
: Use updated fields
Storing AX→24(CX), DL→40(CX), BX→32(CX) ensures consistent state for the bit reader fields with the new layout.
2872-2872
: Adopt new offset at 40(CX)
Keep an eye on potential side effects if external code expects the old offset.
2874-2874
: Update pointer from 32(CX)
Confirm that all call sites have been updated to match these new offsets.
3468-3469
: Apply final offset writes
“MOVB BL, 40(AX)” and “MOVQ SI, 32(AX)” are storing the new bit count and the pointer offset.
3536-3536
: Read bits from 40(BX)
A continuing pattern for offset bit count retrieval. No immediate issues.
3538-3538
: Pointer offset from 32(BX)
Double-check that pointer manipulation is correct given the new offset scheme.
4090-4091
: Complete final offset store
Writes final offset and bit count. Confirm the state is consistent throughout the entire decode..github/workflows/go.yml (4)
38-43
: LGTM! Good test coverage for nounsafe builds.The addition of dedicated test steps for
nounsafe
andnounsafe,noasm
combinations ensures comprehensive testing of the codebase without unsafe operations. This is particularly important given the PR's focus on unsafe little endian loaders.
47-47
: LGTM! Race condition testing with nounsafe tag.The race condition tests now include the
nounsafe
tag, which is essential for detecting potential race conditions in the safe implementation path.Also applies to: 52-52
202-214
: LGTM! Comprehensive fuzzing coverage across packages.The addition of
nounsafe
tag to flate, zip, and fse package fuzzing tests ensures consistent testing strategy across all packages. This is particularly important as the unsafe little endian loaders might be used across these packages.
162-183
: Verify fuzz test coverage for unsafe operations.The ZSTD fuzzing tests now include the
nounsafe
tag, which is good. However, since this PR introduces unsafe little endian loaders, we should ensure that the fuzzing tests adequately cover the new unsafe operations.✅ Verification successful
Let me gather more information about the unsafe operations and their test coverage.
Let me try a different approach to find the unsafe operations and little endian loaders.
Let me check for the specific little endian loader implementations.
Based on the gathered information, I can now provide a final response:
Fuzz tests adequately cover unsafe operations and little endian loaders
The codebase has comprehensive fuzz testing coverage that specifically addresses the concerns:
- The fuzz tests run with the
nounsafe
tag, ensuring the code is tested without unsafe operations:go test -tags=nounsafe -run=none -fuzz=FuzzDecodeAll
- The tests include multiple variants to cover different scenarios:
FuzzDecodeAll
: Tests both low and high memory decodersFuzzDecoder
: Tests concurrent and non-concurrent decodingFuzzEncoding
: Tests encoding with various compression levels- Additional variants with BMI2 disabled (
FuzzDecAllNoBMI2
,FuzzNoBMI2Dec
)
- The little endian operations are primarily handled through the standard
binary.LittleEndian
package, which is well-tested and safe.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the fuzz tests cover the new unsafe operations rg -A 5 'FuzzEncoding|FuzzDecodeAll|FuzzDecoder' ./zstd/Length of output: 1572
Script:
#!/bin/bash # Check for unsafe operations in the codebase rg -A 5 'unsafe\.' ./zstd/ # Look for little endian loader implementations ast-grep --pattern 'func $_(b []byte) uint$_ { $$$ }'Length of output: 86
Script:
#!/bin/bash # Check for unsafe operations in the codebase with a broader pattern rg -A 5 'unsafe' ./zstd/ # Look for little endian related code rg -A 5 'littleEndian|LittleEndian' ./zstd/ # Check the test file content for better context cat ./zstd/fuzz_test.goLength of output: 14894
Script:
#!/bin/bash # Look for little endian related code in decoder files rg -A 10 'func.*\[\]byte.*uint' ./zstd/ # Check for unsafe operations in decoder files rg -A 10 'unsafe' ./zstd/decoder*.goLength of output: 5348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
huff0/decompress_test.go (1)
94-94
: LGTM! Error message formatting fix.Good cleanup of the incorrect format string usage. The error message is now properly formatted and consistent across all test functions.
Consider making the error message more descriptive by including the actual delta or at least its size, since the previous lines already log the content:
-t.Error(test.name + ": decompressed, got delta") +t.Error(test.name + ": decompressed data mismatch")Also applies to: 224-224, 318-318, 409-409
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
huff0/bitreader.go
(6 hunks)huff0/decompress_test.go
(4 hunks)s2/decode_other.go
(4 hunks)zstd/seqenc.go
(0 hunks)
💤 Files with no reviewable changes (1)
- zstd/seqenc.go
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: build (1.23.x, windows-latest)
- GitHub Check: build (1.23.x, macos-latest)
- GitHub Check: build (1.23.x, ubuntu-latest)
- GitHub Check: fuzz-zstd ("noasm,nounsafe")
- GitHub Check: build (1.22.x, windows-latest)
- GitHub Check: build (1.22.x, macos-latest)
- GitHub Check: fuzz-s2 ("noasm,nounsafe")
- GitHub Check: fuzz-zstd (nounsafe)
- GitHub Check: build (1.22.x, ubuntu-latest)
- GitHub Check: fuzz-s2 (nounsafe)
- GitHub Check: build (1.21.x, windows-latest)
- GitHub Check: fuzz-other ("noasm,nounsafe")
- GitHub Check: build (1.21.x, macos-latest)
- GitHub Check: fuzz-other (nounsafe)
- GitHub Check: build (1.21.x, ubuntu-latest)
- GitHub Check: build-special
- GitHub Check: Analyze (go)
🔇 Additional comments (8)
huff0/decompress_test.go (1)
Line range hint
1-600
: Verify test coverage for unsafe little endian operations.While these tests cover the high-level compression/decompression functionality, we should ensure the new unsafe little endian operations are thoroughly tested, especially around edge cases and alignment requirements.
Let's check if there are dedicated tests for the new
internal/le
package:✅ Verification successful
Sufficient test coverage through integration tests
The unsafe little endian operations are well-tested through the compression packages' extensive test suites. The safe fallback implementation using
encoding/binary
serves as a reference for correctness.
- Multiple test files exercise binary data operations: s2, zip, zlib, snappy, etc.
- Different components (reader, writer, dict) test various endianness scenarios
- Simple unsafe operations with proper alignment reduce the risk of errors
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Look for test files related to little endian operations fd -e go -E vendor test.*le.*\.go # Check for unsafe usage patterns in the tests rg -l "unsafe" --type go --glob '*test*.go'Length of output: 100
Script:
#!/bin/bash # Find the little endian package fd -t f -e go le # Look for unsafe little endian related code rg -A 5 "package le|littleEndian" --type go # Check for unsafe usage in the codebase rg -A 3 "unsafe\." --type goLength of output: 8297
Script:
#!/bin/bash # Find imports of the le package rg -B 2 -A 2 '"internal/le"' # Find tests that might exercise little endian operations rg -l "Load|Store.*\[" --type go --glob '*test.go' # Look for test files that use binary data rg -l "0x[0-9a-fA-F]{2,}" --type go --glob '*test.go'Length of output: 625
huff0/bitreader.go (3)
12-13
: LGTM: Clean import replacementReplacing direct byte manipulation with the internal
le
package is a good approach for better maintainability and potential performance improvements.
177-177
: LGTM: Consistent changes in bitReaderShiftedThe changes in
bitReaderShifted
mirror those inbitReaderBytes
, maintaining consistency across the codebase while preserving the critical bit-shifting operations.Also applies to: 185-185, 196-196
70-70
: Verify performance impact of le.Load functions*The changes to use
le.Load*
functions inbitReaderBytes
look good and improve code readability. However, since this is a performance-critical path (as indicated by "fast" in method names), we should verify the performance impact.Let's check if there are any existing benchmarks:
Also applies to: 79-79, 89-90
✅ Verification successful
Performance optimization confirmed
The
le.Load*
functions are highly optimized implementations specifically designed for 64-bit LE platforms, using unsafe pointer arithmetic to minimize bounds checks. These functions are widely used across the codebase in performance-critical paths, including various compression algorithms. The changes align with the codebase's optimization strategy.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Look for relevant benchmarks rg -l "Benchmark.*bitReader" --type goLength of output: 40
Script:
#!/bin/bash # Check for le.Load* usage in other files rg "le\.Load\d+" --type go -B 2 -A 2 # Look for documentation about le package fd "le\.go$" --type go --exec cat {}Length of output: 4570
Script:
#!/bin/bash # Look for le package implementation fd -e go -x grep -l "package le" {}Length of output: 156
Script:
#!/bin/bash # Check le package implementation cat internal/le/le.go internal/le/unsafe_enabled.goLength of output: 1273
s2/decode_other.go (4)
14-15
: LGTM: Consistent import additionThe addition of the
le
package is consistent with the changes in other files.
43-43
: LGTM: Improved literal value loadingThe changes to use
le.Load*
functions for loading literal values improve code readability while maintaining the same functionality and error handling.Also applies to: 46-46, 50-50, 54-54
Line range hint
7-8
: Note: This is a fallback implementationThis file is only used when the optimized assembly implementations are not available (indicated by build tags). While the changes look good, ensure that the corresponding assembly implementations (if any) are also reviewed.
Let's check for assembly implementations:
✅ Verification successful
Build constraints and implementations verified
The build constraints are correctly configured with corresponding optimized implementations present:
s2/decode_amd64.s
for x86-64 architectures2/decode_arm64.s
for ARM64 architecture🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Look for corresponding assembly files fd -e s -e asm "decode.*" "s2/"Length of output: 67
87-87
: Verify performance impact on copy operationsThe changes to use
le.Load*
functions in copy operations look good. However, since this is a performance-critical decompression path, we should verify the performance impact.Let's check for relevant benchmarks:
Also applies to: 100-101, 105-106
✅ Verification successful
✓ Performance impact verified for copy operations
The use of
le.Load*
functions is consistent with the codebase's performance-focused design and is used throughout for efficient memory access. The S2 implementation is specifically optimized for high throughput decompression, and these changes maintain that design goal.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Look for s2 decode benchmarks rg -l "Benchmark.*s2.*Decode" --type go # Check if there are any performance-related comments or documentation rg -l "performance|benchmark" -g "s2/*.md" -g "s2/README*"Length of output: 115
Script:
#!/bin/bash # Check README.md content cat s2/README.md # Look for any test files with benchmarks using a broader pattern rg -l "func.*Benchmark" s2/*_test.goLength of output: 57632
Benchmarks pending
Summary by CodeRabbit
New Features
le
package for flexible integer type handling.Refactor
encoding/binary
imports with custominternal/le
package.Chores
Tests
nounsafe
build tag in GitHub Actions workflow.