-
Notifications
You must be signed in to change notification settings - Fork 672
Have ImageReader always operate on file extensions #2683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I suppose if we make this change we might also consider changing |
|
It's a little bit awkward for formats with more than one potential extension, choosing the first normative one is quite confusing as to which hook is being called precisely. If we have the first entry be equivalent to the format then why don't way say that is the format to avoid the complications of a involving a very open type like
I believe we rather fix that by letting you register hooks for To argue a new type, fewer meaningless representation holes, i.e. possible endings that all represent unknown files, and more possible specific methods come to mind. The PR demonstrates something important though, that we both think of known extensions to be treated as known formats in hooks and decoder selection. For sake of a sketch let's say we name that impl ImageKind {
fn from_extension(_: &OsStr) -> Self { … } // not a Result
fn format(&self) -> Option<ImageFormat>;
fn extension(&self) -> &OsStr; // the constructed extension or the normative one
}
impl From<ImageFormat> for ImageKind { … }
// Probably we should not have PartialEq? Compare extension or not?
// ImageKind::from_extension("png") == ImageKind::from_extension("PNG")
// ImageKind::from_extension("jpg") == ImageKind::from_extension("jpeg")So, more or less the current internal but when we recognize the extension we have both an extension and the corresponding kind. Then we have no problem returning that from
I think that re-opens the discussion around the use cases for hooks a bit. The extension hooks have priority over any builtin so we do not break it with updates. But specifying a builtin format as the target of a hook implies we do have support; but the feature may not be enabled. What should we make of the situation, does the same intention apply? Or are the use cases a fallback or as specialization for cases such as |
|
Yeah, that line of thinking was why I initially treated built-in formats and extensions separately. But now I'm wondering if it matters? The only formats this would apply to is JPEG, TIFF and PNM. There's no difference for non-built-in formats and overriding the built-in formats is already an edge case. Plus newer formats generally don't create both 3 and 4 letter extensions or create a bunch of different extensions for different subtypes, so even if we add more formats in the future, I don't expect they'd have multiple extensions. On top of that, when given The last case is PNM. There the different extensions do have subtly different meaning. But guess-format hooks run before our built-in format detection logic so you'd be free to register your own detection to pick the individual extensions if you wanted. And |
|
Maybe we best look at other systems for handling this. Compare the web stack, i.e. everything works through explicit media types. So we'd register hooks and guesses against a media type and translate extensions as well as |
|
Addendum: For extensions that do not match any known media type, translate them to an 'opaque' media type similar to opaque host that only matches that extension itself. |
|
This might be a stupid question, because I don't have As I see it, both builtin and plugin formats have to do the same things:
Currently, builtin and plugin formats use different code paths to do the same thing (except for encoders+metadata, cause plugins don't support that right now), so why not have a single system that can do both? Or in other words, why couldn't we use the hook system (or something similar) for builtin formats as well? Having a unified system seems a lot easier to me and would eliminate the current API limitations for plugin formats. (To be specific, I'm not saying we should use the hook system exactly as is right now. I'm suggesting having a single registry for all formats (builtin and plugin).) |
|
The builtin |
|
I think all of that would still be possible. In my mind, the global registry would just be a static
As I see it, we wouldn't lose any functionality. |
|
Formats added via hooks always operate on a |
|
You're right, that's a difference I haven't thought of. Would that be a problem, though? As I see it, there are some upsides and some downsides. (Might have missed some stuff.) Cons:
Pros:
This tradeoff sounds reasonable to me, especially for more high-level interfaces like The same problem/tradeoff will likely also apply to encoders with a unified interface, since we might want to amortize the cost of virtual function calls via buffering there too. |
|
I don't think it is worth it. Static dispatch for the built-in formats has been working well, and once we decide on what the external facing API should be, we should be able to hide the distinction from downstream users. Adding a bit of overhead (I measured 1-2% for PNG and I suspect TIFF could be worse) to all our high-level APIs just doesn't seem that appealing to make the internal implementation slightly simpler. |
|
The 1-2% is from decoding from an in-memory buffer, right? If so, then I would say that it's good news that the (probably) worst-case scenario got <2% slower through dynamic dispatch. I suspect that the overhead for file IO would be hard to even measure. That said, we could use static dispatch for builtin decoders/encoders. I just wanted everything to function the same way, but this isn't a hard requirement or limitation. So performance wouldn't be a blocker for a unified system. |
This attempts to simplify the hooks logic/handling inside
ImageReaderby having it always operate on file extensions. As a result, specifying a built-in format now still attempts to dispatch to hooks if there's one registered for the format.For formats with multiple extensions, we always default to the first listed format in
ImageFormat::extensions_str. This has been adjusted to be .jpg, .tiff, and .pnm respectively.Internally, the format is tracked as a
Cow<'static, OsStr>which lets us avoid allocations when provided built-in formats.