-
Notifications
You must be signed in to change notification settings - Fork 58
Description
RFC5 defines metadata objects that contain references to other metadata entities. A key example is how coordinateTransformation objects can have input and output fields which are the names of coordinateSystem objects. Associating each transformation object with a pair of references to other objects means parsers must perform a lot of checks. If we consider a scale transformation with the following metadata defined in the metadata of a Zarr group:
{'type': 'scale', 'scale': [1,2,3], 'input': A, 'output': B}then a validator has to check the following things:
- are
AandBidentical? - are
AandAthe names ofcoordinateSystemobjects? - is
Athe path to a Zarr array? - is
Athe name of a coordinate system defined in a Zarr array? - is the transform inside the metadata of a Zarr group that contains multiscale groups?
- are
AandBthe names of multiscale groups?
- are
- is the transform contained in a sequence / inverse / byDimension / bijection transform?
- If the transform is contained in a sequence, is it the first and / or last element?
- Is the transform inside a
multiscales.datasetsJSON object?- is
Aidentical to the'path'field of themultiscalesdictionary?
- is
- is the transform inside the
coordinateTransformationsfield of amultiscalesJSON object?- Is
Aidentical to the name of the intrinsic coordinate system?
- Is
It's possible that I missed something, or conveyed redundant checks. I think it's clear that the semantics of the input and output fields are pretty complicated, and unfortunately context-dependent -- the same coordinateTransform object might be valid in one place, but invalid in another, due to the input field alone.
Although coordinate transformations model functions, the above behavior is very unlike how functions typically work in programs or in mathematics. Functions in those domains are reusable, but coordinateTransformations are not, since they are "branded" with the names of input and output coordinateSystem objects.
I think it would be simpler if coordinateTransformations objects were only concerned 1 thing: defining a f(x) -> y mapping, where x and y are both tuples of numbers. That means, instead of {'type': 'scale', 'scale': [1, 2, 3], 'input': A, 'output': B}, we would have {'type': 'scale', 'scale': [1, 2, 3]}. This is closer to how functions are defined in programming languages, and also closer to how ome-zarr 0.4 - 0.5 work today. It also makes defining a sequence transform simpler -- instead of worrying about composing the sequence transform's input and output fields with the input and output fields of its content, you can just define a sequence as an array of other transforms. Very simple.
But input and output contain important information. We must put that information somewhere else in metadata. Here is my proposal: we make coordinateSystems a JSON object with keys that form the names of output spaces, and values that declare the names of the input space, the transform objects, and the axes of the output space. The whole thing would look like this:
I think this can convey everything that RFC5 conveys, but IMO it's much simpler. We get simplicity by separating the definition of the coordinate transformations (as plain functions in the coordinateTransformations JSON object) from their application in the coordinateSystems object. This mirrors how functions work in programs: we define them once, and then use (and potentially re-use) them in a separate context where we assign semantics to their inputs and outputs. Maybe it's too late to make these kind of changes to RFC5, but I figured it was worth writing this up in any case :)
{ "coordinateTransformations": { "affine": { "type": "affine", "params": [[0,0,0], [0,0,0], [0,0,0]] }, "nonlinear_tx": { "type": "weird_warp", "params": "path_to_warp_field" }, "sequence": ["affine", "nonlinear_tx"] // these names are resolved in the `coordinateTransformations` keys }, "coordinateSystems": { "default": { // This is the name of the output coordinate system. maybe we require a coordinateSystem named `'default'`? "input": null, // same as leaving input unset "transforms": ["affine"], "axes": [...], }, "inline_scale": { "input": "default", "transforms": [{"type": "scale", "params": [1,1,1]}] // inline transforms are allowed instead of references "axes": [...] }, "atlas_indirect": { "input": "default", // this name are resolved in the `coordinateSystems` keys "transforms": ["nonlinear_tx"], // these names are resolved in the `coordinateTransformations` keys "axes": [...] }, "atlas_direct": { "input": null, "transforms": ["sequence"], "axes": [...] } } }