Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser Fix]: OMICS-DI literature supplemental data #170

Open
1 of 9 tasks
gtsueng opened this issue Oct 31, 2024 · 1 comment
Open
1 of 9 tasks

[Parser Fix]: OMICS-DI literature supplemental data #170

gtsueng opened this issue Oct 31, 2024 · 1 comment

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Oct 31, 2024

Issue Name

OMICS-DI literature supplemental data

Issue Description

OMICS-DI ingests publication records for supplemental data from Biostudies-literature (which ingests from European PMC). Clicking 'access data' for these records from the Discovery Portal will take you to the OMICS-DI record, which when clicked, may take you to the Biostudies record, which when clicked, will take you to the European PMC record where you may (or may not) find the downloadable supplemental files.

These records for publication supplemental data make up about ~1.4M out of ~2.2M OMICS-DI records.

To be determined: Do we want to continue to ingest and include all of OMICS-DI, or do we want to filter OMICS-DI such that only data that is not supplemental data from publications is included?

  • Pros of ingesting as is:
    • PMC Supplemental data is too messy to ingest. The supplemental data ingested via Biostudies into OMICS-DI is presumably a curated subset, where the metadata has been aligned with OMICS-DI and is expected to be within the scope of OMICS-DI
    • Helps users find OMICS data that would be otherwise buried in the literature or only in OMICS-DI
    • Keeps OMICS-DI intact
  • Cons of ingesting as is:
    • Adds a large number of records where data access is very convoluted (need to follow 2-3 links to get to it, if it's even available).
    • Can easily be mistaken as publication records instead of dataset records since the metadata for the supplemental data is based the publication record
    • It can potentially make OMICS-DI faster to parse and update if such records were skipped based on a base url or identifier match

To do:

  • Determine if the literature supplemental data ingested into OMICS-DI should be included
  • IF it should NOT be included, update the parser to skip anything with a biostudies-literature in the url

Issue Example

https://data.niaid.nih.gov/resources?id=s-epmc6182170

Related WBS task

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

@gtsueng
Copy link
Contributor Author

gtsueng commented Nov 12, 2024

Note the decision made on this issue will affect the number of records that will be linked via NCBI LinkOut (see issue https://github.com/NIAID-Data-Ecosystem/niaid-feedback/issues/126)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant