Skip to content

Exposing "Links" harvested from DCAT sources #237

@rhodges

Description

@rhodges

In GeoPortal Harvester v2.7.2 (and v2.6.5, both tested), the list of links for DCAT records (as served by Esri HUB, CKAN, Socrata, et al.) are not being populated in catalog's 'links' dropdown.

Southern California Coastal Water Research Project (SCCWRP) has an ESRI HUB instance. We harvest from it as a DCAT source using the URL https://dataportal.sccwrp.org/api/feed/dcat-us/1.1.json. That metadata holds a long list of useful links that seem similar to the list of links offered when harvesting other formats of metadata.

My research indicates that the 'distribution' section is the correct place to store these sorts of links in DCAT: https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution

An example:
Here is a record called "Bight 18 Sediment Toxicity Summary Results" (https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results/about). When harvested, the source JSON includes the following:

{
       "@type": "dcat:Dataset",
       "identifier": "https://www.arcgis.com/home/item.html?id=c71310596dae42efa5d076f993bdbb37&sublayer=0",
       "landingPage": "https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results",
       "title": "Bight 18 Sediment Toxicity Summary Results",
        …
        "distribution": [
            {
                "@type": "dcat:Distribution",
                "title": "ArcGIS Hub Dataset",
                "format": "Web Page",
                "mediaType": "text/html",
                "accessURL": "https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results"
            },
            {
                "@type": "dcat:Distribution",
                "title": "ArcGIS GeoService",
                "format": "ArcGIS GeoServices REST API",
                "mediaType": "application/json",
                "accessURL": "https://gis.sccwrp.org/arcserver/rest/services/Bight2018ToxicitySummaryResults/FeatureServer//0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "CSV",
                "format": "CSV",
                "mediaType": "text/csv",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/csv?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "Shapefile",
                "format": "ZIP",
                "mediaType": "application/zip",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/shapefile?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "GeoJSON",
                "format": "GeoJSON",
                "mediaType": "application/vnd.geo+json",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/geojson?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "KML",
                "format": "KML",
                "mediaType": "application/vnd.google-earth.kml+xml",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/kml?layers=0"
            }
        ],
   …
},...

When harvested, only the following is presented in the Catalog UI:

Image

This ignores all of the useful links to "ArcGIS Hub Dataset", "ArcGIS GeoService", "CSV", "Shapefile", "GeoJSON", and "KML".

A user IS able to find these, only if they dig through the raw JSON provided by the "JSON" link above by navigating to to:

{
	…
	"_source": {
		…
		"_json": {
			…
			"distribution": [ {in here} ],
		}
	}
}

Examples of the same functionality working for other Metadata formats

ISO 19135/19115-2:

NCEI's Passive Accoustic Sanctsound data:

Image

The links are stored with some similar language as DCAT: "MD_Distribution" as opposed to "distribution":

<gmd:distributionInfo>
<gmd:MD_Distribution>
	<gmd:distributor>
		<gmd:MD_Distributor>
			<gmd:distributor{X}>
				<gmd:...>
					<gmd:onLine>
						<gmd:CI_OnlineResource>
							<gmd:linkage>
								<gmd:URL>
								    {LINK HERE}
								</gmd…>

FGDC:

Image

The metadata looks roughly like this:

<metadata ….>
	<idinfo>
		<citation>
			<citeinfo>
				…
				<onlink>https://www.boem.gov/atl-5yr-2019-2024.zip</onlink>
				<onlink>https://www.boem.gov/Five-Year-Program/</onlink>
				<onlink>https://www.boem.gov/ak-5yr-2019-2024.zip/</onlink>
				<onlink>https://www.boem.gov/pac-5yr-2019-2024.zip</onlink>
				<onlink>https://www.data.boem.gov/Mapping/Files/Gom_5yr_2019_2024.zip</onlink>
           	     </citeinfo>
            </citation>
    </idinfo>
</metadata>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions