Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend the sniffing capability to head out to the endpoint to find out what it is #6

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pvgenuchten
Copy link

Some comments:

  • it may return an array of protocols, if multiple are supported
  • for backwards compatibility, you need to set first=False to allow multiple (multiple is also slow)
  • you need to set extended=True to activate the extended sniffing
  • depends also on add ogcapi types to list OSGeo/Cat-Interop#38 to enforce the types on cat-interop

pending:

  • tests
  • docs

…t what it is

some comments:
- it may return an array of protocols, if multiple are supported
- for backwards compatibility, you need to set first=False to allow multiple (multiple is also slow)
- you need to set extended=True to activate the extended sniffing
@tomkralidis
Copy link
Member

Thanks @pvgenuchten! Initial comments:

  • I've dusted off geolinks in PR migrate from Travis to GitHub Actions, housekeeping #7; you'll need to rebase
  • the extended functionality should be in a separate Python file (say geolinks/probe.py), and then called from geolinks/__init__.py codepath if probe=True is passed, so something like:
if extended:
    LOGGER.debug('Probing URL')
    return probe.detect(url)
  • we will need to add OWSLib as a requirement (requirements.txt is now added in master branch)
  • suggest to change extended=True|False to probe=True|False (default still keeping to False)
  • returning a single link type vs. a list of link types: why don't we take this opportunity to update the geolinks.sniff_link signature then, to return a list (even if it's a list of one item) to keep the Python API consistent

@pvgenuchten
Copy link
Author

pvgenuchten commented Jul 1, 2022

Hi Tom, thanx for reviewing, i like the probe idea, fixing reqs

I missed the dusting commit, will rebase

returning a single link type vs. a list of link types: why don't we take this opportunity to update the geolinks.sniff_link signature then, to return a list (even if it's a list of one item) to keep the Python API consistent

This may cause problems in software using geolinks (like pycsw) when updating... I'm fine either way

@tomkralidis
Copy link
Member

This may cause problems in software using geolinks (like pycsw) when updating... I'm fine either way

Yes, this will cause breakage, we can safeguard accordingly.

add owslib to requirements
@pvgenuchten
Copy link
Author

Some additional comments/questions:

  • do you think we should respect a robots.txt disallow statement? do not sniff if domain/robots.txt is set to disallow
  • I guess for OGCAPI we probably need a different strategy, see if an endpoint is openapi and then go into the collections to evaluate of which type they are?

@tomkralidis
Copy link
Member

do you think we should respect a robots.txt disallow statement? do not sniff if domain/robots.txt is set to disallow

I would probably keep that outside geolinks, given most clients do not. So your downstream application can grab the base URL from the link and inspect robots.txt.

I guess for OGCAPI we probably need a different strategy, see if an endpoint is openapi and then go into the collections to evaluate of which type they are?

I guess this has to do with the is the entry point of a geolinks workflow in the current OGC API design patterns. In other words, OGC APIs now provide a clear way to articulate a link's "type" via link relations. Of course, some workflows would receive a full link object, and some just an href (which needs geolinks :) ).

Thinking out loud, examples from https://demo.pygeoapi.io/master/collections/gdps-temperature?f=json

This is clearly direct access to a coverage:

        {
            "type": "application/x-grib2",
            "rel": "http://www.opengis.net/def/rel/ogc/1.0/coverage",
            "title": "Coverage data as GRIB2",
            "href": "https://demo.pygeoapi.io/master/collections/gdps-temperature/coverage?f=GRIB2"
        }

This is direct access to a collection:

        {
            "type": "application/json",
            "rel": "collection",
            "title": "Detailed Coverage metadata in JSON",
            "href": "https://demo.pygeoapi.io/master/collections/gdps-temperature?f=json"
        }

...which (when probe=True) can inspect the endpoint further and build up a list of link types.

P.S.

I missed the dusting commit, will rebase

Your PR prompted the dusting commit, so thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants