Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type indications for user interfaces in parameter schemas #395

Open
m-mohr opened this issue Feb 9, 2024 · 32 comments · May be fixed by #467
Open

Type indications for user interfaces in parameter schemas #395

m-mohr opened this issue Feb 9, 2024 · 32 comments · May be fixed by #467
Assignees

Comments

@m-mohr
Copy link

m-mohr commented Feb 9, 2024

We recently concluded Testbed 19 and had a work item where we tried to combine openEO and OGC API - Processes. I worked on a visual client https://m-mohr.github.io/gdc-web-editor/

What we realized is that the UI generation for processes exposed by OGC APIs is much harder as we only have limited indication by the JSON Schema. In openEO we started a list of "subtypes" that define common types of input, e.g. collection, bbox, date, duration, epsg-code, geojson, wkt2, year, etc. This helps to render much more user-friendly UI. While a bbox usually just provides a list of 4 numbers in the UI, it can now render a map where you can select a bbox. Just as an example. Some non-visual clients also benefit from it. See https://github.com/Open-EO/openeo-processes/blob/master/meta/subtype-schemas.json for a list of subtypes. This is open to extensions.

The subtype is a custom keyword to JSON Schema and has it's own meta-schema, so also validates in JSON Schema validators. While I think it might not be something for the spec itself, it certainly could go into a best practice.

I think this is something that OGC API - Processes currently doesn't have, so could be something to align between OGC API - Processes an openEO. We now have a client (GDC Web Editor) that can connect to both OGC API - Processes and openEO and I'd hope we see more in the future. Such subtypes would be greatly beneficial and make OGC APIs much more accessible to non-coders.

Example:
grafik

This is the process description for it:
https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json
Search for the subtype properties in the parameter schemas.

The map, date and the band selection wouldn't be possible as such without subtypes. Only the date selection could be achieved somewhat through the format keyword.

Thoughts?

@fmigneault
Copy link
Contributor

I'm curious about the decision around subtype.
Why not instead use $ref with a reference to a well established JSON-schema with #<definition-name> to pick the specific definition? Could also make use of the $id property of JSON-schema to identify a known type from a definition catalog.

@m-mohr
Copy link
Author

m-mohr commented Feb 15, 2024

My main point here is about having more specific type indications, not necessatily a specific "encoding". So whether it's called subtype, $ref, format or something else is not so important for me.

But to give some background:

  • We started initially with the format keyword, but back in the days there were discussions in JSON Schema to deprecate format, so we tried to avoid format and decided to define a new JSON Schema keyword with a meta-schema to validate it.
  • About $ref:
    1. In openEO we want to have processes to be self-contained because otherwise they are pretty must indigestible by web clients (unless you want to spawn like dozens of HTTP requests to resolve them all). One request to GET /processes was meant to provide you all information needed for process graph construction.
    2. Another point is that in draft-07 of JSON Schema (2019/2020 didn't exist at the time we defined openEO), $ref couldn't live alongside other properties, so that makes things more complicated and doesn't easily allow customizing/restricting subtypes. Yes, there's allOf, but for simplicity we try to avoid allOf, oneOf and anyOf as much as possible.
    3. I guess we could've implied specific types by the URL in the $ref, but honestly we also just didn't think about it. Probably mostly due to point 2.

@fmigneault
Copy link
Contributor

I agree with having specific identifiers. Even for something as simple as bounding box, it should be made obvious to distinguish it from any other array of numbers. A naming authority should be used as well to provide more context, linking references, and definition details. In the of bounding box for example, I would love to have processes indicate something similar to:

"$id": "http://www.opengis.net/def/glossary/term/BoundingBox"

@pvretano
Copy link
Contributor

pvretano commented Apr 28, 2024

I like the "$id"/subtype/(or whatever we call it) idea and I am trying to figure out what to do to resolve this issue but I am confused about something.

In the example that @m-mohr cites, https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json, I see things like this (for a bounding box in this case):

{
  "title": "Bounding Box",
  "type": "object",
  "subtype": "bounding-box",
  "required": [
    "west",
    "south",
    "east",
    "north"
  ],
  "properties": {
    "west": {
      "description": "West (lower left corner, coordinate axis 1).",
      "type": "number"
    },
    "south": {
      "description": "South (lower left corner, coordinate axis 2).",
      "type": "number"
    },
    "east": {
      "description": "East (upper right corner, coordinate axis 1).",
      "type": "number"
    },
    "north": {
      "description": "North (upper right corner, coordinate axis 2).",
      "type": "number"
    },
    "base": {
      "description": "Base (optional, lower left corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "height": {
      "description": "Height (optional, upper right corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "crs": {
      "description": "Coordinate reference system of the extent, specified as as [EPSG code](http://www.epsg-registry.org/) or [WKT2 CRS string](http://docs.opengeospatial.org/is/18-010r7/18-010r7.html). Defaults to `4326` (EPSG code 4326) unless the client explicitly requests a different coordinate reference system.",
      "anyOf": [
        {
          "title": "EPSG Code",
          "type": "integer",
          "subtype": "epsg-code",
          "minimum": 1000,
          "examples": [
            3857
          ]
        },
        {
          "title": "WKT2",
          "type": "string",
          "subtype": "wkt2-definition"
        }
      ],
      "default": 4326
    }
  }
},

If there is an "$id" or "subtype" defined for this, why would I need ALL this schema? Why would the input's schema not just be (using "subType" as the identifier token):

{
  "subtype": "bounding-box"
}

Presumably the identifier "bounding-box" would imply ALL the rest. No?

@m-mohr
Copy link
Author

m-mohr commented Apr 29, 2024

@pvretano That has three main reasons for openEO, but is a design decision that can be decided differently in OAP:

  1. We didn't want to force implementations to have clients to resolve external references. See also Type indications for user interfaces in parameter schemas #395 (comment) . (Similarly, we recommend for /queryables that servers resolve $refs before sending them to the clients.)
  2. We wanted that implementers can adapt their implementations based on their capabilities , e.g. not support the third dimension or not support WKT2 by removing these parts from the schema. It's then still pretty much the same base-schema, but clients can read from the schema what is missing. This issue occurs more often in openEO compared to OAP because of high number of pre-defined processes.
  3. Lastly, openEO clients still mostly use the non-subtype schema for making sense of the schema, but in some cases you just need a separate hint to indicate how the UI is meant to be rendered. So this is more an additive thing rather than the foundation. Starting with only the subtype makes it pretty foundational, which was never the purpose.

@pvretano
Copy link
Contributor

pvretano commented Apr 29, 2024

@m-mohr thanks for that. I get it. So, rest of the SWG. Would you prefer the "subtype" approach used by OpenEO or the "$id" approach proposed by @fmigneault? Personally I have no strong preference one way or the other but closer alignment between OpenEO and OAProc would be nice. Please make your preferences known.

@fmigneault
Copy link
Contributor

SWG Meeting 2024-04029: Leaning toward format since it is already employed for similar use cases in https://docs.ogc.org/is/18-062r2/18-062r2.html#toc36 (see table 13), but more discussion needed.

@m-mohr
Copy link
Author

m-mohr commented Apr 29, 2024

Two points for consideration for the format:

@cportele
Copy link
Member

@m-mohr

I do not think that your points are arguments against the use of "format".

Regarding

JSON Schema doesn't clearly say yet whether format only works for "string"-types properties yet

That statement is at least outdated. JSON Schema Validation 2020-12 is pretty clear:

All format attributes defined in this section apply to strings, but a format attribute can be specified to apply to any instance types defined in the data model defined in the core JSON Schema.

Regarding:

Some validators fail if you provide them unknown formats, which is not quite the intention here, I think. If something is unknown it should just be ignored.

I have never used such a validator, all that I have used (Java and JavaScript implementations) do not show this behavior. A question is how important are those validators - also given that they do not properly implement JSON Schema Validation 2020-12 (asserting format is optional and has to be disabled by default):

Implementations MAY still treat "format" as an assertion in addition to an annotation and attempt to validate the value's conformance to the specified semantics. The implementation MUST provide options to enable and disable such evaluation and MUST be disabled by default. Implementations SHOULD document their level of support for such validation.

That said, a new annotation is of course also an option.

If a new annotation is used, the keyword should start with "x-". Or probably "x-ogc-" as we have used in OGC API Features Part 5, Schemas, to reduce the risk of keyword name clashes.

From the current JSON Schema plans:

In order to support future-compatibility, keywords which are not known by the implementation MUST be disallowed.

The keyword prefix x- defines a safe space for users to introduce custom annotations without the need for an explicit custom keyword.

Implementations MUST refuse to process schemas which contain unknown keywords.

@m-mohr
Copy link
Author

m-mohr commented Apr 29, 2024

Yeah, subtype in openEO was defined before the x- recommendation was in place, format was debated to be removed and when ajv (the primary JS validator) still errored for unknown formats. Good that this is not the case anymore. We actually were on format before as well, but were worried about a potential removal of format so went with a separate keyword. So then it is probably not as relevant anymore and format seems fine (assuming we are using the new drafts, openEO is still on draft-07).

@fmigneault
Copy link
Contributor

I agree with @cportele about the points. If implementations misbehave with format, it's up to them to fix their code. Format is a properly defined field with the exact purpose we are looking for:

Structural validation alone may be insufficient to allow an application to correctly utilize certain values. The "format" annotation keyword is defined to allow schema authors to convey semantic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other external specifications.
https://json-schema.org/draft/2020-12/json-schema-validation#name-foreword

For that same reason, I would rather use format than yet another custom field. If we add some x-ogc- field, implementations will have to look under many locations instead of using format's behavior that is already described in JSON schema and the OAP specification.

@cportele
Copy link
Member

In JSON-FG we also use ajv to validate all examples. This is done without asserting "format", but treating it as the annotation that it now is.

I have looked at ajv and indeed it seems to not strictly conform to the JSON Schema spec, since you have to explicitly state validateFormats: false (i.e., false is not the default as required by the spec). And "by default unknown formats throw exception during schema compilation." So, ajv should only be used with validateFormats: false - or alternatively with proper configuration for all the format values.

In general, I always disable "format" validation when validating JSON instances. The behavior is too different across implementations. It should be handled as an annotation, which it now is in the spec.

@pvretano
Copy link
Contributor

Took a look at OpenEO and they have the following subtypes defined:

  • "band-name"
  • "bounding-box"
  • "chunk-size"
  • "collection-id"
  • "datacube"
  • "epsg-code"
  • "file-path"
  • "file-paths"
  • "geojson"
  • "input-format"
  • "input-format-options"
  • "kernel"
  • "labeled-array"
  • "metadata-filter"
  • "output-format"
  • "output-format-options"
  • "process-graph"
  • "raster-cube"
  • "temporal-interval"
  • "temporal-intervals"
  • "udf-code"
  • "udf-runtime"
  • "udf-runtime-version"
  • "vector-cube"
  • "wkt2-definition"
  • "year

The also have date-time, date, time duration and uri defined which seem to be duplicates of the values defined for the JSON Schema format parameter.

Assuming that we intend to use the format parameter to provide "subtype" hints we would need to expand "Table 15 — Additional values for the JSON schema format key for OGC Process Description" with additional values and probably meta-schemas (like OpenEO does).

So my question is, which additional values should we add? Of do we need to add any values at all and instead simply have some informative guidance indicating the the format parameter can be used to provide subtype hints and if you use it define a vocabulary ... or both (i.e. define some minimal set of values AND provide informative guidance).

Looking at the OpenEO list, some of these are purely OpenEO specific (e.g. udf-code, udf-runtime, ufd-runtime-version) but others seem pretty generic.

I await your feedback.

@m-mohr
Copy link
Author

m-mohr commented May 27, 2024

Many of them are indeed pretty openEO specific and evolve from the specific usecases and process definitions.
It would probably make sense to look at process definitions of OAP and see what is commonly used.

What could probably make sense is

  • date-time / date / year / duration / temporal-interval
  • epsg-code / wkt2-defintiion / ...
  • bounding-box
  • geojson
  • an OGC API adapted equivalent for collection-id
  • an OGC API equivalent for metadata-filter (i.e. for CQL2 Text and/or JSON)
  • ...

Generally, maybe this should be more a best practice rather than a standard so that it can evolve more agile. The standards can link to it though.

@m-mohr
Copy link
Author

m-mohr commented Jul 16, 2024

Just as a note: I found x-ogc-role in Feature - Part 5, which seems to have a very similar / the same purpose compared to what was proposed here.

@pvretano
Copy link
Contributor

@m-mohr one issue with using Part 5 is that Part 5 deals with "logical" schemas. The use of x-ogc-role is to tag each property with a role not a type. That is, property "X" is the "id" (i.e. the primary identifier) and property "Y" is the "primary-instant" (temporally), etc. It is more like schema constraints in SQL than some sort of type indication. Property "X" and property "Y" can be any type at all.

Is this what you are looking for? If yes, then we can adopt x-ogc-role. If not then I would propose we extend what we already have which is the JSON-Schema format tag.

@m-mohr
Copy link
Author

m-mohr commented Jul 22, 2024

Seems I misunderstood the x-ogc-role. What I intended to propose here seems closer to format then.

Generally, I found part 5 pretty confusing, maybe because it mixes concerns...

@bpross-52n
Copy link
Contributor

SWG meeting from 2024-07-22: We agreed to use the format-element. Please comment on additional types that you would like to see included. In the SWG we discussed adding extended collections, {map, coverage,...}, code list annotations and annotations for WKT representations.

@sptillma
Copy link
Contributor

@pvretano
A couple of other types I thought about was...

  1. Boolean
  2. Object (or group) - where an input required multiple inputs like a geometry and CRS
    We have used this library for our stuff up to this point - I'm not suggesting we use it but rather I thought we might get some ideas from it: https://rjsf-team.github.io/react-jsonschema-form/docs/api-reference/uiSchema/

@fmigneault
Copy link
Contributor

@sptillma
Is there a case where "Boolean" cannot be handled by the type: boolean directly?
If this is referring to some string boolean-like value, such as "TRUE", "OK", "YES", "NO", 1, 0, then a more explicit schema using enum sounds more effective.

Is there some specific geometry you have in mind?
There is currently ogc-bbox, and a few other variants like geojson-feature-collection for more specific structures (https://docs.ogc.org/DRAFTS/18-062.html#_rec_ogc-process-description_format-key).

I don't think it is a good idea to have format: object or format: group, since that is not really more useful than type: {}. It's a "catch-all" definition that doesn't inform more about what is expected for that input.

@m-mohr
Copy link
Author

m-mohr commented Jul 22, 2024

Please comment on additional types that you would like to see included.

See #395 (comment)

@bpross-52n
Copy link
Contributor

bpross-52n commented Oct 28, 2024

SWG meeting from 20.10.2024: Peter will add stac-catalog, stac-item and stac-collection to the list. Then #467 will be merged.

@m-mohr
Copy link
Author

m-mohr commented Oct 28, 2024

In openEO we additionally have defined for STAC input to OGC API - Processes:

  • stac-itemcollection (pretty common in batch processing)
  • stac-stac (a union of all STAC types [catalog, item, collection, itemcollection] for convenience, which I think is a pretty common usecase, too)

Any thoughts on adding those, too?

@pvretano
Copy link
Contributor

pvretano commented Oct 28, 2024

@m-mohr I don't have a problem adding these too ...

Question:

If I label an input with, say stac-item, what does that mean for the value of the input? Is it a link with reference to a STAC item or is it a string with a JSON-encoded STAC Item as its value or is it a JSON object that is a STAC Item ... or it can be any of these?

I ask because I want to describe the value(s) properly in the specification.

@m-mohr
Copy link
Author

m-mohr commented Oct 28, 2024

Depends on the type and the context, I think @pvretano.

In openEO we are using it in the context of CWL:

  • File, File[] -> Path to files
  • String -> This is ambiguous, could be URIs could be JSON encoding. I don't know enough about CWL to really ecommend anything.

If you just have JSON Schema, it depends on the type:

  • object => then it's a JSON object (not JSON-encoded as string).
  • string => That's a bit ambiguous, but I'd probably assume a URI by default as we are already in a JSON context here so JSON-encoded values don't make a lot of sense to me. You could probably use contentMediaType for enforcing JSON encoding though.

@pvretano
Copy link
Contributor

So in openEO, you could have an input of type "file" and that would be the path to a local STAC item ... for example ... is that correct?

@m-mohr
Copy link
Author

m-mohr commented Oct 28, 2024

That is CWL, not openEO. @pvretano

We just recommend in openEO, that EOAP that have STAC as input or output should use CWL's type File or File[] with format set to one of the STAC types discussed above. See https://github.com/Open-EO/openeo-processes/blob/76a9cfcb075751d1e74142f897c423f896fa8c3d/meta/implementation.md#ogc-api---processes at the end...

In openEO we also use JSON Schema, so the other part of my previous comment. But we face the same ambiguity for the string type, so clarifying it is good...

@fmigneault
Copy link
Contributor

@m-mohr @pvretano
In CWL, format is typically a URI (contrary to JSON schema format that is "just some string"). If EOAP recommends mappings from STAC to CWL File type (which is good IMO), it should also use format with URIs as expected by CWL, such that stac:item or https://...stac-schemas.../item both resolve to corresponding STAC definitions. Using stac:item needs an explicit $namespaces: {stac: "https://...stac-schemas.../"} in the CWL. The "definitions" that format refer to do not have to be schemas (but might as well be if they are available). In other words, CWL format are conceptually closer to contentSchema than JSON-schema's format.

@m-mohr
Copy link
Author

m-mohr commented Oct 28, 2024

Ah, that why we initially used a colon instead of a dash. Can we ensure that this is consistent across CWL and OGC APIs?

@fmigneault
Copy link
Contributor

As long as the OGC API definitions resolve to valid URIs in the OGC Naming Authority, I'm fine with it. I've been using CWL $namespace: {"ogc": "http://www.opengis.net/def/media-type/ogc/1.0/"} for a while such that ogc:geotiff resolves: http://www.opengis.net/def/media-type/ogc/1.0/geotiff for example. The same could be done for other "format" entries proposed here.

@m-mohr
Copy link
Author

m-mohr commented Oct 29, 2024

Hmm, having that in mind I'm wondering why OGC actually needs to define STAC types.
Can't STAC define them? We just have the issue that our "registry" is less clean/good for that approach:

If namespace is https://schemas.stacspec.org/v1.1.0/ then it would be stac:item-spec/json-schema/item.json to the schema, which is less than ideal.

Even if we use https://github.com/radiantearth/stac-spec/tree/v1.1.0 as namespace then it would be stac:item-spec for example, which resolves to a GitHub page instead of the schema and we don't even have the union type or itemcollection defined in that namespace.

@fmigneault
Copy link
Contributor

Yes, that works as well if the references are under https://schemas.stacspec.org. What's important to me is that whichever location is used, it is the same in STAC and OGC documentation so they are interoperable natively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants