Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metadata Improvement]: Generate abstracts to improve the display of descriptions #159

Open
10 of 23 tasks
gtsueng opened this issue Aug 12, 2024 · 0 comments
Open
10 of 23 tasks
Labels
enhancement New feature or request

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Aug 12, 2024

Issue Name

Generate abstracts to improve the display of descriptions

Issue Description

In order to improve the display of record descriptions, we should create descriptions of more consistent length. See NIAID-Data-Ecosystem/nde-portal#147 for more details. These created descriptions should be saved to the abstract field.

Some things to take into consideration:

  • length limitations of ChatGPT before hallucinating results for measurementTechniques: https://docs.google.com/spreadsheets/d/1crfLDl5_c7jZ47JefhOCf6tx_cM-u8AkK6rsXttM6s8/edit?gid=1639648736#gid=1639648736
  • Title length optimization for SEO is ~70-80 characters
  • Description length optimization for SEO is ~140 - 160 characters
  • The ~<400 character range appears to have around the same number of records as the ~<50 word range (and there are ~1 million records within this character/word length range)
  • The current character limitation for the display of the description in the current search results/card view depends on screen resolution, browser zoom, and window size. Descriptions greater than this length are displayable after clicking on the interactive element.

The Ask: We are seeking approval to start this work

Next steps:

  • Use ChatGPT to generate example summaries for "Go | No-go" decision-making
  • Design a Lyssna study to determine description length preferences (test 3-5 length ranges) and it's display in the record view We will use 170 words (the abstract length for Scientific Data)
  • Use ChatGPT to generate sample descriptions within target length ranges for use in Lyssna study
  • Check point: Provide the sample generated descriptions and original descriptions to NIAID for review and approval to continue.
  • Run Lyssna study using the GPT-generated sample descriptions
  • Systematically use GPT to generate short descriptions (abstracts) based on the results of the Lyssna study for records with descriptions that fall in different bins of length and evaluate results to identify cut off thresholds below which the quality of the short descriptions are too low.

Issue Discussion

No response

Please select the type of metadata improvement

  • Standardization (normalizing free text to an ontology)
  • Augmentation (adding values for metadata fields missing values)
  • Clean up (addressing redundancy or messy metadata)
  • Structure (changing the structuring of the metadata to support front end UI features)

Meta URL

No response

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/13
https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/52
https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/54

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

Request status check list

  • This metadata improvement has yet to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement does not need to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement has been discussed/reported between NIAID, Scripps, Leidos
  • This metadata improvement has been implemented locally to generate data for review
  • This metadata improvement has been implemented on Dev
  • This metadata improvement has been implemented on Dev and the results have been reviewed and approved for staging
  • This metadata improvement has been implemented on Staging
  • This page/documentation/change has been approved for Production
  • This page/documentation/change has been implemented on Production
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant