Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser Fix]: Change SRA parser handling of 'isBasedOn' values #112

Open
gtsueng opened this issue Oct 20, 2023 · 0 comments
Open

[Parser Fix]: Change SRA parser handling of 'isBasedOn' values #112

gtsueng opened this issue Oct 20, 2023 · 0 comments
Assignees

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Oct 20, 2023

Background: On Tuesday October 17th, the Production API went down. According to @DylanWelzel's investigations, the cause was due to SRA's excessively large metadata records, where many records had in excess of 1000 objects listed in the 'isBasedOn' field. This was addressed by @everaldorodrigo adjusting the memory size, but the core issue is excessively large SRA metadata record

An SRA record is project or study-based. Each record may reference thousands of runs, experiments, samples, etc. This is causing issues with memory when trying to query SRA records.

  1. Revisit the metadata that is being parsed into the 'isBasedOn' property
  2. Investigate potential changes to the parser that can address the core issue:
  • Parse multiple records of the same type to the same 'IsBasedOn' object. Since the identifier field can be an array, it's possible to cut down the number of repetitive 'isBasedOn' objects which only differ by 'identifier'
  • If this doesn't work, set an upper limit on the number of 'isBasedOn' objects to parse, then add some sort of indicator that the user should visit SRA if they want to see more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants