-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Message Decoder #44140
base: feature/storage/content-validation
Are you sure you want to change the base?
Structured Message Decoder #44140
Conversation
return decodedContent.toByteArray(); | ||
} | ||
|
||
public byte[] decodeFully() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be an overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm curious if we even need this method. It will probably depend on how the download integration goes but if we don't have a strong reason for it now, let's remove it as we can always add it back if we decide we need it.
private static final int DEFAULT_MESSAGE_VERSION = 1; | ||
private static final int V1_HEADER_LENGTH = 13; | ||
private static final int V1_SEGMENT_HEADER_LENGTH = 10; | ||
private static final int CRC64_LENGTH = 8; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to put these in a constants file to share between the encoder and decoder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. We should try to consolidate once the encoder PR gets merged.
|
||
public StructuredMessageDecoder(ByteBuffer inputBuffer) { | ||
this.buffer = inputBuffer.order(ByteOrder.LITTLE_ENDIAN); | ||
readMessageHeader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than reading the message header during initialization, I think it makes more sense to keep all reads in the decode method and just check if the message offset is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Mainly when we initialize the class, it is not intuitive that it would already start decoding/consuming the input buffer. I'd imagine more of the expectation would be that we consume from the input buffer when decode is explicitly called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This looks like a good start. I just had feedback on the initial API we should try to shoot for. After we get the public interfaces in a spot we like, I plan to look more closely into the tests and more into the details of the decoding implementation in follow up reviews.
private static final int CRC64_LENGTH = 8; | ||
|
||
private int messageVersion; | ||
private int messageLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably keep this a long to match the type that is supported when using structured message. Technically we should even be treating it as an unsigned long for any comparisons as well.
} | ||
|
||
messageLength = (int) buffer.getLong(); | ||
flags = StructuredMessageFlags.fromValue(Short.toUnsignedInt(buffer.getShort())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we receive flags that are not part of our enumerated flags, I'm assuming this will be validated? It might be worth adding a test for it.
|
||
public StructuredMessageDecoder(ByteBuffer inputBuffer) { | ||
this.buffer = inputBuffer.order(ByteOrder.LITTLE_ENDIAN); | ||
readMessageHeader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Mainly when we initialize the class, it is not intuitive that it would already start decoding/consuming the input buffer. I'd imagine more of the expectation would be that we consume from the input buffer when decode is explicitly called.
private long segmentCrc64 = 0; | ||
private final Map<Integer, Long> segmentCrcs = new HashMap<>(); | ||
|
||
public StructuredMessageDecoder(ByteBuffer inputBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of the constructor, I'm thinking instead of accepting an input byte buffer, we'd accept the expected content length of the individual download and the decode()
method would accept a byte buffer containing a part of the entire encoded structured message. The main reason that I'm suggesting this is that I suspect in order to integrate this with the client, we will be provided a flux of byte buffers so the content of the entire encoded structured message will be broken across multiple buffers. I imagine it will be similar in integration style to the client side encryption Decryptor class (note that it accepts a Flux<ByteBuffer>
but we'll just want to accept ByteBuffer
for now so that Reactor integration is decoupled from logic to do the actual decoding)
} | ||
|
||
private void readMessageHeader() { | ||
if (buffer.remaining() < V1_HEADER_LENGTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if we update the constructor to take an expected message size, we can raise this validation in the constructor similar to what Python does: https://github.com/Azure/azure-sdk-for-python/blob/b896d5c9ee202bd7c3cba51eb602562d41f4d804/sdk/storage/azure-storage-blob/azure/storage/blob/_shared/streams.py#L466
throw new IllegalArgumentException("Unsupported structured message version: " + messageVersion); | ||
} | ||
|
||
messageLength = (int) buffer.getLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also do validation that the message length pulled from the message matches what was passed in as part of the constructor. We should be able to match Python here as well: https://github.com/Azure/azure-sdk-for-python/blob/b896d5c9ee202bd7c3cba51eb602562d41f4d804/sdk/storage/azure-storage-blob/azure/storage/blob/_shared/streams.py#L559-L561
} | ||
} | ||
|
||
public byte[] decode(int size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to updating the decode
to accept a byte buffer instead of a size
, let's have it return a byte buffer instead of byte array. I'm mainly suggesting this because the typical downstream download logic accepts byte buffers and we'd likely need this type I suspect to make the integration smoother (i.e. all logic is only working with byte buffers)
} | ||
|
||
public byte[] decode(int size) { | ||
ByteArrayOutputStream decodedContent = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more public method we are going to need is a finalize method (I'm not too tied to that name if you have a better one) that will tell the decoder that we have finished decoding and have it validate that we've consumed the entire message. I'm suggesting this because I'm not sure if the case is handled where the structured message get cut off and we only process a portion of the structured message. If we call this finalize after feeding through all byte buffers, it would then throw an error if in fact we did not get complete the decoding process.
API change check APIView has identified API level changes in this PR and created following API reviews. |
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines