-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How to extract thumbnail bytes from Jpeg? #276
Comments
You can actually avoid going back through the file by modifying the ExifReader class. The JPEG thumbnail is stored in the APP1 segment. In the readJpegSegments method, the APP1 segment is passed to the TiffReader as a byte array. I just added a quick snippet after extracting the thumbnail data in that method to go back through the array and grab the actual thumbnail data ExifReader.java
The offset that is given to us for the thumbnail is relative to the beginning of the Tiff data (II/MM), which can be skipped to using the Jpeg Preamble length. I'm sure there is a better way to incorporate this into the library, but hopefully that helps to get you started. |
Thanks for your feedback. Do you think that version 2.10.1 would allow me to do that or would I need to wait for a future version? |
Ah, I see... Unfortunately there doesn't seem to be a way to get the bytes directly #262. Looking back at this commit, you can see that functionality was removed to prevent storing so much data in memory. In the comments of that commit, they talk about finding the offset you are looking for and some of the problems that came up. There are some issues that Drew links to in his dotnet library that might help too. If you are willing to go back to reading the jpeg file afterwards, you would just need to find the APP1 segment (0xFFE1) and then from there get to the Tiff marker (II/MM). JPEG Structure Info . I know it isn't ideal, but there doesn't seem to be a better way with out modifying the library as it is now. I don't believe the storage of the bytes would be added again, but adding a tag for the offset from start of image wouldn't be a bad idea. Looking at the issues, there doesn't seem to be anybody actively working on this, though. |
That's great, thanks for the feedback. |
So, that method was removed and we are also being affected by this backwards incompatible change. I think at least the library could point to the new way to get thumbnail bytes from images to not break its users. |
We also need this ability to come back, and have made temporary workarounds in our code (using other methods to get the thumbnail) in the meanwhile. |
This PR lays groundwork for pointer-based parsing, which could be enhanced to grab thumbnails. Most cases of byte array storage are removed and replaced with ReaderInfo pointers. These contain the global starting position and other bits of data that could be used to go back and read thumbnails after the fact. A new Thumbnail directory would be a good start that holds a list of ReaderInfo's to thumbnail locations as other readers do their thing. This might be relatively straightforward - although that's always easy to say. The Java PR version is a port of a similar PR from the .NET version, which is still being reviewed. I think this kind of parsing (pointers instead of byte arrays) is the only way to do certain actions, including giving access to desirable sets of bytes without actually reading those bytes into memory. |
Looks like it is still not fixed, see also #149. For all stumbling over this issue, it is possible to dynamically "hack" the
This makes the reader create another fake tag in the You can access the thumbnail data in the follwing way:
|
Thanks @haumacher! Will try your code, so we could upgrade our old metadata-extractor version without breaking things... |
Hi,
I'm using ImageMetadataReader to read metadata from a jpeg file, and I'd like to gain access to the thumbnail bytes.
Previously, using version 2.7, I could do this:
and could then have access to the bytes of the thumbnail image.
With version 2.10, these methods are no longer available, so I try this:
so I can get the offset (relative to the start of this directory?) and the length of the data block, but I cannot see how to get access to the data itself.
I have tried reading the jpeg file afterwards (using a FileInputStream) and extracting bytes from it, but that only works if I add the value (from TAG_THUMBNAIL_OFFSET) to a further offset to go from the start of the file, and the appropriate value to add depends on which jpeg I'm using. So I guess I need to ask the thumbDir what its offset is relative to the start of the file, and add that to the value of the TAG_THUMBNAIL_OFFSET, is that correct? Does the Directory class have any way to know where it is? Or can the Metadata object tell where a directory (or its data) can be found?
Or am I missing something and the thumbnail bytes are accessible in another way?
The text was updated successfully, but these errors were encountered: