Skip to content

Commit

Permalink
Merge pull request #685 from crim-ca/collection-input
Browse files Browse the repository at this point in the history
  • Loading branch information
fmigneault authored Sep 4, 2024
2 parents 20a5cd8 + 03f34ee commit 64ca393
Show file tree
Hide file tree
Showing 37 changed files with 2,105 additions and 118 deletions.
26 changes: 22 additions & 4 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,28 @@ Changes

Changes:
--------
- No change.

Fixes:
------
- Add support of *OGC API - Processes: Part 3* ``collection`` as input to a `Process`
(fixes `#682 <https://github.com/crim-ca/weaver/issues/682>`_).
- Add ``AnyCRS`` schema definition with improved validation of allowed values.
- Use ``AnyCRS`` schema for ``SupportedCRS``, ``XMLStringCRS``, ``BoundingBoxValue`` and ``ExecuteCollectionInput``
instead of a generic ``URL`` schema definition for better reference validation, while allowing alternate short forms.
- Add auto-resolution of media-type for cases where it can reasonably be inferred from a ``schema`` reference,
such as an URI referring to a ``.json`` or ``.xsd`` respectively representing `JSON` and `XML` data.
- Update ``cwltool`` with fork
`fmigneault/cwltool @ fix-load-contents-array <https://github.com/fmigneault/cwltool/tree/fix-load-contents-array>`_
until ``loadContents`` behavior is resolved for ``type: File[]``
(relates to `common-workflow-language/cwltool#2036 <https://github.com/common-workflow-language/cwltool/pull/2036>`_).

Fixes:
------
- Fix `CWL` I/O with ``format`` defined as a `JavaScript Expression` to be incorrectly parsed by the convertion
operations to extract applicable media-types. These cases will be ignored, since media-types cannot be inferred
from them. The `WPS` or `OAS` I/O definitions should instead provide the applicable media-types
(relates to `common-workflow-language/cwl-v1.3#52 <https://github.com/common-workflow-language/cwl-v1.3/issues/52>`_).
- Fix ``format`` parsing when trying to infer media-types from various I/O definition representations using a
reference provided as an URI schema from an ontology. Parsing caused the URI to be split, causing an invalid
resolution. If no appropriate media-type is provided, JSON will be used by default, while preserving the submitted
schema URI.
- Fix invalid resolution of ``weaver.formats.ContentEncoding.open_parameters``.
- Fix minor resolution combinations or redundant checks for multiple ``weaver.formats`` utilities.
- FIx `CWL` ``format`` resolution check against `IANA` media-types if the reference ontology happens to be
Expand Down
7 changes: 7 additions & 0 deletions docs/examples/collection-input-basic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"inputs": {
"image-input": {
"collection": "https://example.com/collections/sentinel-2"
}
}
}
21 changes: 21 additions & 0 deletions docs/examples/collection-input-filter-cql2-json-ogc-features.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"inputs": {
"features": {
"collection": "https://example.com/collections/dataset-features",
"format": "ogc-feature-collection",
"filter": {
"op": "s_intersects",
"args": [
{"property": "geometry"},
{
"type": "Polygon",
"coordinates": [ [30, 10], [40, 40], [20, 40], [10, 20], [30, 10] ]
}
]
},
"filter-crs": "https://www.opengis.net/def/crs/OGC/1.3/CRS84",
"filter-lang": "cql2-json",
"sortBy": "-id"
}
}
}
11 changes: 11 additions & 0 deletions docs/examples/collection-input-filter-cql2-text-stac.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"inputs": {
"images": {
"collection": "https://example.com/collections/sentinel-2",
"format": "stac-collection",
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"filter": "properties.eo:cloud_cover < 0.1",
"filter-lang": "cql2-text"
}
}
}
97 changes: 89 additions & 8 deletions docs/source/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,47 @@ Glossary
- :ref:`quotation`
- :ref:`conf_quotation`

Builtin Process
An immutable :term:`Process` that comes pre-packaged with `Weaver`, without need to be deployed.
This is usually an "utility" or "converter" :term:`Process` that is often reused across :term:`Workflow`
definitions.

.. seealso::
Refer to the :ref:`proc_builtin` section for more details about available processes.

CLI
| Command Line Interface
| Script that offers interactions through shell commands or Python scripts to execute any described operations.
Details of the provided `Weaver` commands are described in :ref:`cli` chapter.
Collection
A geospatial resource that may be available as one or more sub-resource distributions
that conform to one or more |ogc-api-standards|_. Additionally, |stac-collections|_ can
be included in this group.

Please refer to the :term:`OGC` official |ogc-collection|_ for more details
and complementary terminology.

CRS
Coordinate Reference System
Geospatial data encoding of the representation parameters, describing the structure to locate entities,
in terms of axis order (latitude, longitude, altitude, etc.), dimension (2D, 3D, etc.), in respect to a
specific celestial object, a position of origin, scale and orientation (i.e: |datum-def|_).

.. seealso::
- Reference :term:`W3C`/:term:`OGC` documentation about |crs|_.
- Reference definition of |crs-def|_.

CWL
| |cwl|_
| Representation of the internal :term:`Application Package` of the :term:`Process` to provide execution
methodology of the referenced :term:`Docker` image or other supported definitions.
A |cwl|_ file can be represented both in :term:`JSON` or :term:`YAML` format, but is often represented
in :term:`JSON` in the context of `Weaver` for its easier inclusion within HTTP request contents.
See :ref:`application-package` section for further details.
Common Workflow Language
Representation of the internal :term:`Application Package` of the :term:`Process` to provide execution
methodology of the referenced :term:`Docker` image or other supported definitions.
A |cwl|_ file can be represented both in :term:`JSON` or :term:`YAML` format, but is often represented
in :term:`JSON` in the context of `Weaver` for its easier inclusion within HTTP request contents.

.. seealso::
- Official |cwl|_ documentation.
- :ref:`application-package` section for further details.

Data Source
Known locations of remote servers where an :term:`ADES` or :term:`EMS`
Expand All @@ -76,6 +105,7 @@ Glossary
types by providing additional formats that are more specifics to some data domains.

EMS
Execution Management Service
| |ems|
| See :ref:`processes` section for details.
Alternative operation modes are described in :ref:`Configuration Settings`.
Expand All @@ -89,11 +119,42 @@ Glossary
:ref:`opensearch_data_source` section.

ESGF
|esgf|_
Earth System Grid Federation
An open source effort providing a robust, distributed data and computation platform,
enabling world wide access to large-scale scientific data.

.. seealso::
|esgf|_ official website.

ESGF-CWT
|esgf-cwt-git|_

.. seealso::
:ref:`proc_esgf_cwt` for more details about the :term:`Process` type.

Feature
An abstraction of real-world phenomena into a digital entity representation, which includes
information detailing its *extent* (i.e.: how it is placed and located in time and space).

.. seealso::
- :term:`OGC` |feature-ogc-def|_ definition.
- :term:`W3C` |feature-w3c-def|_ definition.
- :term:`W3C` |feature-w3c-desc|_ examples and extended description.

GeoJSON
| Geospatial :term:`JSON`
| A specific :term:`JSON` format representation for encoding a variety of geographic data structures,
such as ``Point``, ``LineString``, ``Polygon``, ``MultiPoint``, ``MultiLineString``, and ``MultiPolygon``,
``Feature``, and ``FeatureCollection``.
.. seealso::
Refer to the official |geojson|_ specification for more details.

.. note::
Multiple extended or derived variants exist. Notably, the |ogc-api-features|_ and |stac-spec|_ define
additional ``properties`` or additional ``type`` values for particular use cases, but remain generally
interoperable and compatible.

HREF
| Hyperlink Reference
| Often shortened to simply `reference`. Represents either a locally or remotely accessible item, such as a
Expand Down Expand Up @@ -136,6 +197,7 @@ Glossary
such as ``&`` or ``;`` to distinguish between distinct pairs. Specific separators, and any applicable
escaping methods, depend on context, such as in URL query, HTTP header, :term:`CLI` parameter, etc.
Media-Type
Media-Types
MIME-types
| Multipurpose Internet Mail Extensions
Expand All @@ -153,7 +215,12 @@ Glossary
|OpenAPI-spec|_

OGC
|ogc|_
Open Geospatial Consortium
International standards organization for geospatial data and processing best practices
that establishes most of the :term:`API` definition implied under `Weaver`.

.. seealso::
|ogc|_

OAP
OGC API - Processes
Expand Down Expand Up @@ -207,6 +274,14 @@ Glossary
S3
Simple Storage Service (:term:`AWS` S3), bucket file storage.

STAC
| SpatioTemporal Asset Catalog
| Language used to describe geospatial information, using extended definitions of :term:`GeoJSON`,
and which can usually be searched using a |stac-api-spec|_ compliant with |ogc-api-features|_.
.. seealso::
Please refer to the |stac-spec|_ for more details.

TOI
| Time of Interest
| Corresponds to a date/time interval employed for :term:`OpenSearch` queries in the context
Expand Down Expand Up @@ -246,6 +321,12 @@ Glossary
- :ref:`vault_upload`
- :ref:`file_vault_inputs`

W3C
World Wide Web Consortium
Main international standards organization for the World Wide Web.
Since |ogc-api-standards|_ are based on HTTP and web communications, this consortium establishes the
common foundation definitions used by the :term:`API` specifications.

WKT
Well-Known Text geometry representation.

Expand Down
26 changes: 14 additions & 12 deletions docs/source/package.rst
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,7 @@ specific types will be presented in :ref:`cwl-type` and :ref:`cwl-dir` sections.
| | | ``uri``, ``url``, | |
| | | etc.) :sup:`(5)` | |
+----------------------+-------------------------+------------------------+--------------------------------------------+
| |na| | ``BoundingBox`` | :term:`JSON` | Only partial support available. |br| |
| ``File`` | ``BoundingBox`` | :term:`JSON` | Partial support available. |br| |
| | | :sup:`(6)` | See :ref:`note <bbox-note>`. |
+----------------------+-------------------------+------------------------+--------------------------------------------+
| ``File`` | ``Complex`` | :term:`JSON` | :ref:`File Reference <file_ref_types>` |
Expand All @@ -567,22 +567,24 @@ specific types will be presented in :ref:`cwl-type` and :ref:`cwl-dir` sections.
More specific types with these items can help apply additional validation, although not strictly enforced.
| :sup:`(6)` Specific schema required as described in :ref:`oas_json_types`.
.. _bbox-note:
.. note::
The :term:`WPS` data type ``BoundingBox`` has a schema definition in :term:`WPS` and :term:`OAS` contexts,
but is not handled natively by :term:`CWL` types. When the conversion to a :term:`CWL` job occurs, an equivalent
``Complex`` type using a :term:`CWL` ``File`` with ``format: ogc-bbox`` and the contents stored as :term:`JSON` is
employed. It is up to the :term:`Application Package` to parse this :term:`JSON` content as necessary.
Alternatively, it is possible to use a ``Literal`` data of type ``string`` corresponding to :term:`WKT` [#]_ if it
is deemed preferable that the :term:`CWL` script receives the data directly without intermediate interpretation.

.. [#] |wkt-example|_
.. _cwl-type:

Type Resolution
~~~~~~~~~~~~~~~

In the :term:`WPS` context, three data types exist, namely ``Literal``, ``BoundingBox`` and ``Complex`` data.

.. _bbox-note:
.. note::
As of the current version of `Weaver`, :term:`WPS` data type ``BoundingBox`` is not completely supported.
The schema definition exists in :term:`WPS` and :term:`OAS` contexts but is not handled by any :term:`CWL` type
conversion yet. This feature is reflected by issue `#51 <https://github.com/crim-ca/weaver/issues/51>`_.
It is possible to use a ``Literal`` data of type ``string`` corresponding to :term:`WKT` [#]_ in the meantime.

.. [#] |wkt-example|_
As presented in previous examples, :term:`I/O` in the :term:`WPS` context does not require an explicit indication of
which data type from one of ``Literal``, ``BoundingBox`` and ``Complex`` to apply. Instead, :term:`WPS` type can be
inferred using the matched API schema of the I/O. For instance, ``Complex`` I/O (e.g.: file reference) requires the
Expand Down Expand Up @@ -639,8 +641,8 @@ it gets parsed as intended type.

.. versionadded:: 4.16

With more recent versions of `Weaver`, it is also possible to employ :term:`OpenAPI` schema definitions provided in
the :term:`WPS` I/O to specify the explicit structure that applies to ``Literal``, ``BoundingBox`` and ``Complex``
With more recent versions of `Weaver`, it is also possible to employ :term:`OpenAPI` schema (:term:`OAS`) definitions
provided in the I/O to specify the explicit structure that applies to ``Literal``, ``BoundingBox`` and ``Complex``
data types. When :term:`OpenAPI` schema are detected, they are also considered in the merging strategy along with
other specifications provided in :term:`CWL` and :term:`WPS` contexts. More details about :term:`OAS` context is
provided in :ref:`oas_io_schema` section.
Expand Down
Loading

0 comments on commit 64ca393

Please sign in to comment.