Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metadata Improvement]: Evaluate ChatGPT performance on measurementTechnique Extraction #132

Open
6 of 24 tasks
gtsueng opened this issue Apr 11, 2024 · 2 comments
Open
6 of 24 tasks
Assignees
Labels
enhancement New feature or request

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Apr 11, 2024

Issue Name

Evaluate ChatGPT performance on measurementTechnique Extraction

Issue Description

It would be good to be able to evaluate how well ChatGPT extracts measurementTechniques based on Dataset names and descriptions. While the performance may vary based on data type and repository, it would be good to at least evaluate some datasets that already have curated values for measurementTechniques to see how well the results overlap.

Approach:

  • Identify records which have measurementTechnique values from the following repositories (@DylanWelzel since you've already done this, can you send @ZubairQazi the list of record ids?
    • NCBI GEO
    • LINCS
    • REFRAMEDB
  • Randomly select 25 records from the measurementTechnique-containing subset of each of the above repositories
  • Run the ChatGPT measurementTechnique prompt (providing only the name and description) for each of the 75 records (25 per repo)
  • Confirm presence/absence of the measurementTechnique values for each record in the predictions by ChatGPT

Issue Discussion

No response

Please select the type of metadata improvement

  • Standardization (normalizing free text to an ontology)
  • Augmentation (adding values for metadata fields missing values)
  • Clean up (addressing redundancy or messy metadata)
  • Structure (changing the structuring of the metadata to support front end UI features)

Meta URL

No response

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/13

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

Request status check list

  • This metadata improvement has yet to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement does not need to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement has been discussed/reported between NIAID, Scripps, Leidos
  • This metadata improvement has been implemented locally to generate data for review
  • This metadata improvement has been implemented on Dev
  • This metadata improvement has been implemented on Dev and the results have been reviewed and approved for staging
  • This metadata improvement has been implemented on Staging
  • This page/documentation/change has been approved for Production
  • This page/documentation/change has been implemented on Production
@gtsueng gtsueng added the enhancement New feature or request label Apr 11, 2024
@ZubairQazi
Copy link

ChatGPT predictions using the measurement technique prompt:
https://docs.google.com/spreadsheets/d/1jkhidFmsp0f_yL8S5wpZ-oBA-eLhQQmESQEq4Lrhx3M/edit#gid=1310822148

@gtsueng
Copy link
Contributor Author

gtsueng commented Aug 8, 2024

@ZubairQazi based on the weaknesses of the Text2Term pipeline used for matching terms to ontology terms, and preliminary evaluations of the ChatGPT predictions, we need to reduce the number of generic "process" terms that ChatGPT is tacking onto various activities. Can you see if you can use prompt engineering to have ChatGPT produce similar results, minus the generic 'process' terms? E.g. Instead of giving "PCR testing", it would give "PCR". Instead of "Microscopy diagnosis", it would give "Microscopy". IF the term is "Survey" or "Diagnosis" or "Testing" alone, that's fine -- it's when these types of terms are unnecessarily added to other terms that it's messing with the downstream mappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants