Skip to content

Avoid reading unused bytes from PNG streams #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 4, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Source/com/drew/imaging/png/PngChunkReader.java
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ public Iterable<PngChunk> extract(@NotNull final SequentialReader reader, @Nulla
// Process the next chunk.
int chunkDataLength = reader.getInt32();

if (chunkDataLength < 0)
if ((chunkDataLength < 0) || (chunkDataLength > reader.available()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that this isn't a valid use of available(). Here's the JavaDoc for the method:

Returns an estimate of the number of bytes that can be read (or skipped over) from this SequentialReader without blocking by the next invocation of a method for this input stream. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.

Note that while some implementations of SequentialReader like SequentialByteArrayReader will return the total remaining number of bytes in the stream, others will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.

Returns:
an estimate of the number of bytes that can be read (or skipped over) from this SequentialReader without blocking or 0 when it reaches the end of the input stream.

This is a more fundamental "problem", the nature of a stream is that the total size might be unknown. For SequentialByteArrayReader using available() would yield the correct results, while for StreamReader (and potentially other implementations` it would not.

I don't know how a "sane maximum value" could be deducted, but I guess some more finesse is needed.

Copy link
Contributor Author

@Draczech Draczech Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you are right about this. There are implementations of InputStream which might return unexpected results for available(). I didn't realized that when I implemented my fix. I just thought available() is the best chance there is for estimating stream size. And only thought about metadata-extractor as a tool for bounded streams.

What do you suggest then? Using available() might fail for some crazy streams. And technically there is nothing wrong with allocating array up to max Integer. It can cause some Exceptions which are impossible to catch/prevent from happening outside of the library.

Copy link
Contributor

@Nadahar Nadahar Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have a suggestion from the top of my head. I guess a check could be made with instanceof and then use your logic if the reader is a SequentialByteArrayReader , but this still leaves the question of what to do with all the other cases.

Regarding available(), it's not just "crazy streams" where this fails. Generally, for FileInputStream it returns the remaining bytes of the file, and I think that's the reason why many seem to think it reports "the remainder of the stream". For network streams for example, available() will generally just report the remainder of bytes in the buffer at that time, which will probably be a relatively small value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that this might be unnecessary optimization as allocating large buffer is generally OK and reading more bytes from a stream than it actually contains should result in java.io.EOFException: End of data reached.

Let me know though if you come up with any solution or a trick how to handle this issue. I realize it is not caused by the library, but such threat of OOM is really undesirable one.

throw new PngProcessingException("PNG chunk length exceeds maximum");

PngChunkType chunkType = new PngChunkType(reader.getBytes(4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I'm sure makes a lot of sense to GitHub, I can't place a comment any further down than here. I wanted to comment on line 96 below though.

I'm not saying that this will solve it, but I see one possibility to at least make this less likely to happen. As far as I can tell, chunkData is only used if willStoreChunk is true. It seems very wasteful to me to call getBytes() even if the data isn't to be kept, it will allocate memory and read information that will be just handed off to the GC. I would suggest something like this instead:

byte[] chunkData;
if (willStoreChunk) {
  chunkData = reader.getBytes(chunkDataLength);
} else {
  chunkData = null // To satisfy the compiler
  reader.skip(chunkDataLength)
}

With that in place, I guess some sane maximum sizes could be deducted based on the chunk type. I'm no PNG expert, but I would assume that only the IDAT chunk could reasonably contain really large data, and I don't think that chunk is ever relevant for Metadata-extractor to read. For the chunk type of interest for Metadata-extractor, it should be possible to come up with some reasonable maximum value where everything above that would be rejected before calling getBytes() (which is where the allocation occurs)..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this would be a nice workaround! In case of this specific file, the problematic chunk is not read, only skipped. So in the end int chunkDataLength = reader.getInt32(); is negative in next loop iteration and End of data reached. is thrown. Nice!

I can see minor complication in the fact that new PngChunkReader().extract(new StreamReader(inputStream), Collections.emptySet()) and new PngChunkReader().extract(new StreamReader(inputStream), null) can yield completely different result if your code snippet is used. Which is expected I guess, but should be mentioned in documentation/comments.

On the other hand, PngMetadataReader.readMetadata(@NotNull InputStream inputStream) has static set of PngChunkType defined, so no changes are required there :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the changes you proposed, added a short comment regarding the chunk data reading and also included the corrupted file in the tests. Let me know if this solution is OK with you. Thanks!

Expand Down