Produce values derived from one or more tags #10

drewnoakes · 2014-11-19T12:54:37Z

There are many cases where answering a question about an image may involve reading multiple different tags, possibly from different directories.

Dealing with redundancy

Examples:

image width (equally height) may be obtained from the JpegDirectory and ExifIFD0Directory
There is often multiple ways to obtain exposure time
XMP duplicates a lot of existing tags

Devise a strategy that sits on top of the directories and tags for extracting certain commonly used values according to well tested heuristics. One challenge here is that tags may not agree and it may be unclear which to trust.

(Migrated from Google Code issue 26)

Grouping values

Sometimes multiple tags should be combined to produce one logical 'value':

GPS lat / lng
Date & time values (i.e. in IPTC data)
Aspect ratio (Aspect Ratio of JPG Images #494)

The text was updated successfully, but these errors were encountered:

black-snow · 2015-05-04T13:56:33Z

I think we could either

A) Rank the different formats/ directories, run through that order trying to find a matching directory and return the first value we find. Example: Order (XMP, Exif, IPTC), read meta-block, no matches for XMP, read meta-block again (better read once and remember the directories), 2 matches for Exif, look in first match for creation date, found, return that.
B) Just return the first matching tag of whatever directory we find.
C) Collect matching tags from all directories we find and return the "best" (e. g. for creation date this might be just the oldest).

Another question is if the output should somehow be canonized. E. g. Exif DateTime might return yyyy:mm:dd hh:ii:ss wheres IPTC Date Created might return yyyymmdd without a time (since time is stored in Time Created) and other formats might not even provide a time-fragment (date only).

I'd say we start w/ the creation date/time and just try to figure out a smart solution. Option B) seems to simple to me, option C) too complex/heavy (must always read full metadata).

drewnoakes · 2015-05-04T23:05:06Z

I think option A is simplest, best and most transparent: walk through a
list of tags in priority order until one is present, then return that.

The open questions are what kinds of values to do this for, what tags to
consider for each, and how to surface this in the API.

It's not currently possible to reliably map XMP's namespace/path structure to metadata-extractor's integer-based tag identifier system. The current code attempts to map a few properties to tags, but users who rely on these commonly miss other valuable tags. This partial XMP support has been the cause of many issues over the years, and I've been considering removing this support for a long time. Long term, I want the library to allow each directory a means to identify their own tags. The integer system stems from starting with TIFF/Exif. Until that time, I think it's better to remove this code. Much of it is commented out and incomplete. Some users might be surprised to see these tags no longer exist, but I believe that by using the XMPMeta object they'll be better served in the long term. I can imagine providing some constants and helper methods for working with XMPCore, if needed.

rcketscientist · 2017-02-12T21:18:17Z

Here would be my recommendation for fields (sorry caps, snagged from DB keys):

TIMESTAMP (prefer unix time to avoid locale)
MODEL 
APERTURE
EXPOSURE
FLASH
FOCAL_LENGTH
ISO
WHITE_BALANCE
HEIGHT
WIDTH
LATITUDE
LONGITUDE
ALTITUDE
ORIENTATION (or rotation if we settle on a standard)
MAKE
THUMB_HEIGHT
THUMB_WIDTH
LENS_MODEL
DRIVE_MODE
EXPOSURE_MODE
EXPOSURE_PROGRAM

//XMP, but these are considered the most important by many
RATING
SUBJECT
LABEL

The main question is how to wrap these tag preferences. Since this proposal is more of a maker cheat sheet which covers multiple directories it would make sense to have maker-centric classes process an entire meta data set. For example, in my experience on Sony devices I prefer the exif lens model (usually populated) to the maker note one (which is a mess), in almost any other case you'd go to the maker note. It'd be best to post-process an existing metadata within this "wrapper". This should minimize impact to the existing project as well.

paperboyo · 2017-02-12T22:41:39Z

Maybe IPTC<>XMP would be a good candidate as the mappings and reconciliation practices seem to be well described? This schema is also widely used by picture agencies, editing applications and camera makers.

This would also help to avoid writing files that contain non-reconcilled/conflicting metadata with drewnoakes/metadata-extractor-dotnet#65 (when only XMP field is modified but not its legacy sibling(s)).

rcketscientist · 2017-02-12T23:09:28Z

Different concepts. This is about taking the general concept of those fields I mentioned (or more) and automatically pulling a preferable tag from any of the various fields that can exist in an image that might best represent that field. It's merely a convenience for the undoubtedly hundreds of replicas of "pull tag x from driectory y" that everyone has for fields they're looking up.

paperboyo · 2017-02-12T23:35:42Z

You’re right, apologies, @rcketscientist. Your list seems to contain data much more suited for what you’re describing, then, as these are all (with the exception of width, height?) the properties closer to the creation/acquisition phase (makernote, EXIF) as opposed to editing/manipulation one, like IPTC. They are likely to be more “correct” in their original form, also. As opposed to IPTC which may be more correct in the higher level XMP data (e.g. containing full, Unicode Description and not the truncated, ASCII version of it in legacy IPTC field).

drewnoakes · 2017-02-13T00:44:02Z

Additional properties off the top of my head:

COMMENT
COPYRIGHT
AUTHOR
IMAGE_COUNT (for icons, multi-page TIFF, animated GIF)

rcketscientist · 2017-02-13T00:55:58Z

I believe some cameras will insert author, maybe copyright. But typically these (other than image count) are workflow meta additions, right? So these would differ slightly from the others that might be maker or exif, etc.? Not arguing against, I'm just not familiar with these tags.

kwhopper · 2017-02-13T01:13:50Z

I wonder if this could be done with some kind of 'script' engine instead of code? That would keep it open to change or override by end users. It could be something developed just for this, or off-the-shelf but I don't have any concrete suggestions.

That said, I kind of hope this project overall heads in a more scripted direction for processing tags. Explicitly coding tag processing certainly has performance advantages, but the maintenance bar is very high. @drewnoakes has alluded in passing to scripts before in other threads (I think, or my kids have crashed my brain's hard drive). Maybe this gets it off the ground?

rcketscientist · 2017-02-13T13:03:01Z

How do you envision scripting helping? At some point there still needs to be a map to where Random Joe Inc. wants to put their proprietary data. I'm not sure how it would work, but my scripting experience also consists of more forgotten python than I still know.

drewnoakes · 2017-02-13T15:33:46Z

As @kwhopper says, we've spoken before of a new API that uses a more suitable data model internally.

There's a branch on the .NET implementation that sketches out some (non-compilable) API ideas, and we're tracking it in this pull request:

drewnoakes/metadata-extractor-dotnet#90

Feel free to chime in there for general ideas. There are a fair few ideas posted in the PR.

We'll keep this issue open to track this specific feature.

drewnoakes · 2019-06-10T23:37:13Z

An example where a user (of the .NET library) looks in Exif and PNG data to get the date. This will miss cases in these files formats, and doesn't support other file formats, so is a good example of how adding this capability would be generally useful.

https://github.com/persnow/PicSort/blob/7a3d069f31aa19df3f79b06d9e47c5bf45620ed2/PicSortLibrary/ExifParserLibrary.cs#L19-L48

drewnoakes · 2019-08-23T02:45:42Z

JPEG DNL segment is a source of image height.

drewnoakes · 2019-08-23T12:19:54Z

Image height and width may be affected by image orientation. Such a handler could take this into account (per drewnoakes/metadata-extractor-images#26).

drewnoakes · 2020-05-11T04:28:02Z

"Title" is another example for this feature, per #474.

drewnoakes added the enhancement label Nov 19, 2014

drewnoakes removed the type-enhancement label Feb 1, 2015

drewnoakes mentioned this issue Apr 28, 2015

Get creation date from "arbitrary" file-types #106

Closed

paperboyo mentioned this issue Sep 27, 2015

Support XMP and metadata reconcilliation guardian/grid#1117

Closed

paperboyo mentioned this issue Jan 28, 2016

Extract "category" from file metadata guardian/grid#378

Closed

drewnoakes mentioned this issue Mar 21, 2016

Better timestamp handling #147

Open

drewnoakes changed the title ~~Produce values derived from one or more tags (eg: aperture/GPS/image dimensions)~~ Produce values derived from one or more tags Mar 21, 2016

paperboyo mentioned this issue Jul 16, 2016

Extended XMP support #184

Closed

drewnoakes mentioned this issue Feb 13, 2017

New API exploration drewnoakes/metadata-extractor-dotnet#90

Open

rcketscientist mentioned this issue Mar 4, 2017

[Exif SubIFD] Lens Model #251

Open

drewnoakes mentioned this issue May 27, 2019

Calculating landscape/portrait from orientation and dimensions drewnoakes/metadata-extractor-dotnet#159

Open

This was referenced Aug 23, 2019

General way to get height/width #380

Closed

Port of Java #231: Implement Huffman Tables directory and reader drewnoakes/metadata-extractor-dotnet#197

Merged

kwhopper mentioned this issue Aug 10, 2020

Aspect Ratio of JPG Images #494

Closed

drewnoakes mentioned this issue Jan 20, 2021

Load common properties after calling ReadMetadata drewnoakes/metadata-extractor-dotnet#278

Open

drewnoakes mentioned this issue Jun 7, 2021

Panasonic Lens on Panasonic Body - but where is the Lens Make/Model!? drewnoakes/metadata-extractor-dotnet#296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce values derived from one or more tags #10

Produce values derived from one or more tags #10

drewnoakes commented Nov 19, 2014 •

edited

Loading

black-snow commented May 4, 2015

drewnoakes commented May 4, 2015 •

edited

Loading

rcketscientist commented Feb 12, 2017 •

edited

Loading

paperboyo commented Feb 12, 2017

rcketscientist commented Feb 12, 2017

paperboyo commented Feb 12, 2017

drewnoakes commented Feb 13, 2017

rcketscientist commented Feb 13, 2017

kwhopper commented Feb 13, 2017

rcketscientist commented Feb 13, 2017

drewnoakes commented Feb 13, 2017

drewnoakes commented Jun 10, 2019

drewnoakes commented Aug 23, 2019

drewnoakes commented Aug 23, 2019

drewnoakes commented May 11, 2020

Produce values derived from one or more tags #10

Produce values derived from one or more tags #10

Comments

drewnoakes commented Nov 19, 2014 • edited Loading

Dealing with redundancy

Grouping values

black-snow commented May 4, 2015

drewnoakes commented May 4, 2015 • edited Loading

rcketscientist commented Feb 12, 2017 • edited Loading

paperboyo commented Feb 12, 2017

rcketscientist commented Feb 12, 2017

paperboyo commented Feb 12, 2017

drewnoakes commented Feb 13, 2017

rcketscientist commented Feb 13, 2017

kwhopper commented Feb 13, 2017

rcketscientist commented Feb 13, 2017

drewnoakes commented Feb 13, 2017

drewnoakes commented Jun 10, 2019

drewnoakes commented Aug 23, 2019

drewnoakes commented Aug 23, 2019

drewnoakes commented May 11, 2020

drewnoakes commented Nov 19, 2014 •

edited

Loading

drewnoakes commented May 4, 2015 •

edited

Loading

rcketscientist commented Feb 12, 2017 •

edited

Loading