Skip to content

Conversation

@j-tyler
Copy link
Contributor

@j-tyler j-tyler commented Nov 19, 2025

Summary

When CRC validation fails during blob deserialization in MessageFormatRecord, the bytebuf allocated by Utils.readNettyByteBufFromCrcInputStream() was not being released, causing a memory leak.

The issue affected all three blob format versions (V1, V2, V3) in their respective deserializeBlobRecord() methods. When corrupt data triggered a MessageFormatException during CRC validation, the allocated bytebuf slice remained unreleased.

Changes:

  • Extract CRC validation logic into validateCrcAndManageByteBuf() helper method
  • Implement try-finally pattern to ensure ByteBuf.release() on validation failure
  • Apply fix consistently across Blob_Format_V1, V2, and V3 deserializers
  • Add comprehensive leak detection tests using custom ByteBuf wrappers

Testing Done

  • MessageFormatCorruptDataLeakTest verifies ByteBuf cleanup on corrupt data
  • Parameterized tests cover all blob format versions (V1, V2, V3)
  • Control tests ensure no false positives on successful deserialization
  • Uses SliceCapturingByteBuf wrapper via DelegateByteBuf to track slice creation

@codecov-commenter
Copy link

codecov-commenter commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 8.16327% with 180 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.73%. Comparing base (52ba813) to head (e25c319).
⚠️ Report is 330 commits behind head on master.

Files with missing lines Patch % Lines
...java/com/github/ambry/commons/DelegateByteBuf.java 3.22% 180 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3177      +/-   ##
============================================
+ Coverage     64.24%   69.73%   +5.49%     
- Complexity    10398    12809    +2411     
============================================
  Files           840      931      +91     
  Lines         71755    79178    +7423     
  Branches       8611     9431     +820     
============================================
+ Hits          46099    55216    +9117     
+ Misses        23004    21040    -1964     
- Partials       2652     2922     +270     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

When CRC validation fails during blob deserialization in MessageFormatRecord,
the ByteBuf allocated by Utils.readNettyByteBufFromCrcInputStream() was not
being released, causing a memory leak.

The issue affected all three blob format versions (V1, V2, V3) in their
respective deserializeBlobRecord() methods. When corrupt data triggered a
MessageFormatException during CRC validation, the allocated ByteBuf slice
remained unreleased.

Changes:
- Extract CRC validation logic into validateCrcAndManageByteBuf() helper method
- Implement try-finally pattern to ensure ByteBuf.release() on validation failure
- Apply fix consistently across Blob_Format_V1, V2, and V3 deserializers
- Add comprehensive leak detection tests using custom ByteBuf wrappers

Tests:
- MessageFormatCorruptDataLeakTest verifies ByteBuf cleanup on corrupt data
- Parameterized tests cover all blob format versions (V1, V2, V3)
- Control tests ensure no false positives on successful deserialization
- Uses SliceCapturingByteBuf wrapper via DelegateByteBuf to track slice creation
@j-tyler j-tyler force-pushed the j-tyler/message-format-leak-on-crc-error branch from 4274356 to 52076f2 Compare November 23, 2025 18:07
@j-tyler j-tyler marked this pull request as ready for review November 24, 2025 17:43
throw new MessageFormatException("corrupt data while parsing blob content",
MessageFormatErrorCodes.DataCorrupt);
}
validateCrcAndManageByteBuf(crcStream, dataStream, byteBuf, logger);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this one is tricky. The reason why we are not releasing this ByteBuf anywhere is because we didn't increase the refCounter when creating this ByteBuf. In readNettyByteBufFromCrcInputStream, we use slice method to create this ByteBuf, instead of retainedSlice.

If we call release on the failure here, then we would have to increase the refCounter by calling retainedSlice, but that would mean in the success path, we also have to call release one more time.

It does seems like a good idea to call retain/release here, since it shows the clear ownership of each ByteBuf. But in this case, we would have update other files on the success path to release this ByteBuf. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at how this is handled in GetBlobOperation.handleBody() and subsequent calls. The chunkBuf within the BlobData is owned by the caller and it's their responsibility to release. If MessageFormatRecord.deserializeBlob() throws, GetBlobOperation.handleBody() never handles the bytebuf here or in the input stream.

In my read and testing of Utils.readNettyByteBufFromCrcInputStream() it always returns a Bytebuf passing the ownership (and the responsibility to release) to the caller. Thus if validateCrcAndManageByteBuf fails we must release here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another path where we improve the memory management expectations in readNettyByteBufFromCrcInputStream and it's callers which is probably preferred but a slightly larger change. Let me pull this back to draft and spend some more time in testing that approach.

@j-tyler j-tyler marked this pull request as draft November 25, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants