Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser Fix]: distribution.contentUrl for Zenodo #129

Open
1 of 7 tasks
gtsueng opened this issue Mar 20, 2024 · 2 comments
Open
1 of 7 tasks

[Parser Fix]: distribution.contentUrl for Zenodo #129

gtsueng opened this issue Mar 20, 2024 · 2 comments
Assignees

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Mar 20, 2024

Issue Name

distribution.contentUrl for Zenodo

Issue Description

The Zenodo parser currently does not appear to be parsing values for the distribution field. Based on a quick review of 10 Zenodo records on their site, Zenodo uses the following url format to enable access of the files available for download:

  • https://zenodo.org/api/records/{canonical id}/files-archive

While this link is for the download all button on the Zenodo site instead of the link for each individual file download, it can still be parsed to the 'distribution.contentUrl' field.

Issue Example

Example Zenodo record on prod: https://data.niaid.nih.gov/resources?id=ZENODO_6983398
Same record in Zenodo: https://zenodo.org/records/6983398
file download url from record in Zenodo: https://zenodo.org/api/records/6983398/files-archive

Related WBS task

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

@gtsueng
Copy link
Contributor Author

gtsueng commented Apr 25, 2024

@jal347 can you double-check the url you used in the correction?

The data on staging has the following url format https://zenodo.org/record/{identifier}/files-archive <-- This is not correct.

The download urls are actually to their api: https://zenodo.org/api/records/{identifier}/files-archive

@gtsueng
Copy link
Contributor Author

gtsueng commented May 29, 2024

@jal347 I found a few issues when looking at the data in Staging:

The url to access the record does not work for many Zenodo records:

The content.url is broken for many records in spite of the base url being correct. The reason for this is linked to the above issue and has to do with whether or not a zenodo id is a canonical id or a versioned record id.

Cause of issue:

  • In the first case, the identifier appears to be a versioned record identifier
  • In the second case, the identifier is potentially a canonical id
  • Issues with canonical ids
    • Canonical id's appear to be DOIs so the functional urls for them have a different base url: https://zenodo.org/doi/10.5281/zenodo.3242511
    • Canonical id (when available) appear to be 1 less than the version 1 id
    • Canonical ids are redirects which will take the user to the most recent version of a record. There are NO file downloads associated with canonical ids (hence the content.url will always be broken for them).

Potential solution:

  • If a Zenodo record is not using a canonical ID:
    • i.e. does not have format zenodo.#####, keep current functional url
    • distribution.contentUrl should also work and should be kept the same
  • If a Zenodo record is using a canonical ID:
    • Use the zenodo doi base url https://zenodo.org/doi/10.5281/ in conjunction with the canonical id for the url
    • For the distribution (and content.url), the version 1 id can be generated from the canonical id.
      • Version 1 id = canonical id less zenodo. plus 1
        • E.g. Canonical ID = zenodo.3242511, version 1 identifier = 3242512
      • use distribution.name = 'Version 1', with distribution.contentUrl https://zenodo.org/api/records/{version 1 identifier}/files-archive

Other observations:

  • Records likely to use regular zenodo id's over canonical id's = DRYAD records in Zenodo (as they have a completely different DOI structure which can't be used with the Zenodo doi end point)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants