-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce values derived from one or more tags #10
Comments
I think we could either A) Rank the different formats/ directories, run through that order trying to find a matching directory and return the first value we find. Example: Order (XMP, Exif, IPTC), read meta-block, no matches for XMP, read meta-block again (better read once and remember the directories), 2 matches for Exif, look in first match for creation date, found, return that. Another question is if the output should somehow be canonized. E. g. Exif DateTime might return yyyy:mm:dd hh:ii:ss wheres IPTC Date Created might return yyyymmdd without a time (since time is stored in Time Created) and other formats might not even provide a time-fragment (date only). I'd say we start w/ the creation date/time and just try to figure out a smart solution. Option B) seems to simple to me, option C) too complex/heavy (must always read full metadata). |
I think option A is simplest, best and most transparent: walk through a The open questions are what kinds of values to do this for, what tags to |
It's not currently possible to reliably map XMP's namespace/path structure to metadata-extractor's integer-based tag identifier system. The current code attempts to map a few properties to tags, but users who rely on these commonly miss other valuable tags. This partial XMP support has been the cause of many issues over the years, and I've been considering removing this support for a long time. Long term, I want the library to allow each directory a means to identify their own tags. The integer system stems from starting with TIFF/Exif. Until that time, I think it's better to remove this code. Much of it is commented out and incomplete. Some users might be surprised to see these tags no longer exist, but I believe that by using the XMPMeta object they'll be better served in the long term. I can imagine providing some constants and helper methods for working with XMPCore, if needed.
Here would be my recommendation for fields (sorry caps, snagged from DB keys):
The main question is how to wrap these tag preferences. Since this proposal is more of a maker cheat sheet which covers multiple directories it would make sense to have maker-centric classes process an entire meta data set. For example, in my experience on Sony devices I prefer the exif lens model (usually populated) to the maker note one (which is a mess), in almost any other case you'd go to the maker note. It'd be best to post-process an existing metadata within this "wrapper". This should minimize impact to the existing project as well. |
Maybe IPTC<>XMP would be a good candidate as the mappings and reconciliation practices seem to be well described? This schema is also widely used by picture agencies, editing applications and camera makers. This would also help to avoid writing files that contain non-reconcilled/conflicting metadata with drewnoakes/metadata-extractor-dotnet#65 (when only XMP field is modified but not its legacy sibling(s)). |
Different concepts. This is about taking the general concept of those fields I mentioned (or more) and automatically pulling a preferable tag from any of the various fields that can exist in an image that might best represent that field. It's merely a convenience for the undoubtedly hundreds of replicas of "pull tag x from driectory y" that everyone has for fields they're looking up. |
You’re right, apologies, @rcketscientist. Your list seems to contain data much more suited for what you’re describing, then, as these are all (with the exception of width, height?) the properties closer to the creation/acquisition phase (makernote, EXIF) as opposed to editing/manipulation one, like IPTC. They are likely to be more “correct” in their original form, also. As opposed to IPTC which may be more correct in the higher level XMP data (e.g. containing full, Unicode Description and not the truncated, ASCII version of it in legacy IPTC field). |
Additional properties off the top of my head:
|
I believe some cameras will insert author, maybe copyright. But typically these (other than image count) are workflow meta additions, right? So these would differ slightly from the others that might be maker or exif, etc.? Not arguing against, I'm just not familiar with these tags. |
I wonder if this could be done with some kind of 'script' engine instead of code? That would keep it open to change or override by end users. It could be something developed just for this, or off-the-shelf but I don't have any concrete suggestions. That said, I kind of hope this project overall heads in a more scripted direction for processing tags. Explicitly coding tag processing certainly has performance advantages, but the maintenance bar is very high. @drewnoakes has alluded in passing to scripts before in other threads (I think, or my kids have crashed my brain's hard drive). Maybe this gets it off the ground? |
How do you envision scripting helping? At some point there still needs to be a map to where Random Joe Inc. wants to put their proprietary data. I'm not sure how it would work, but my scripting experience also consists of more forgotten python than I still know. |
As @kwhopper says, we've spoken before of a new API that uses a more suitable data model internally. There's a branch on the .NET implementation that sketches out some (non-compilable) API ideas, and we're tracking it in this pull request: drewnoakes/metadata-extractor-dotnet#90 Feel free to chime in there for general ideas. There are a fair few ideas posted in the PR. We'll keep this issue open to track this specific feature. |
An example where a user (of the .NET library) looks in Exif and PNG data to get the date. This will miss cases in these files formats, and doesn't support other file formats, so is a good example of how adding this capability would be generally useful. |
JPEG DNL segment is a source of image height. |
Image height and width may be affected by image orientation. Such a handler could take this into account (per drewnoakes/metadata-extractor-images#26). |
"Title" is another example for this feature, per #474. |
There are many cases where answering a question about an image may involve reading multiple different tags, possibly from different directories.
Dealing with redundancy
Examples:
JpegDirectory
andExifIFD0Directory
Devise a strategy that sits on top of the directories and tags for extracting certain commonly used values according to well tested heuristics. One challenge here is that tags may not agree and it may be unclear which to trust.
(Migrated from Google Code issue 26)
Grouping values
Sometimes multiple tags should be combined to produce one logical 'value':
The text was updated successfully, but these errors were encountered: