Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INVESTIGATION]: Generate Example summaries using ChatGPT #2

Open
1 of 8 tasks
gtsueng opened this issue Sep 4, 2024 · 5 comments
Open
1 of 8 tasks

[INVESTIGATION]: Generate Example summaries using ChatGPT #2

gtsueng opened this issue Sep 4, 2024 · 5 comments
Assignees

Comments

@gtsueng
Copy link
Collaborator

gtsueng commented Sep 4, 2024

Issue Name

Generate Example summaries using ChatGPT

Issue Description

This is a preliminary/quick test for NIAID-Data-Ecosystem/nde-crawlers#159

To demonstrate the value of description length normalization, perform the following:

  • Identify 3 records with the most lengthy description per repository
  • For each record, try feeding the name+description, vs entire record to ChatGPT and ask it to provide a summary description of the following lengths:
    • 140-160 characters (SEO optimized length)
    • 240-280 characters (tweet-optimized length)
    • 3 sentences
    • 5 sentences

Please save the results/examples to the following google document:
Examples from each repository: https://docs.google.com/document/d/1pX0CTaDyQmH-XqvHX-l13ZKT0zB0MRKeHl-B-RTndSg/edit
Key examples: https://docs.google.com/document/d/1KfJg5R-28mhUmcUwAHybuGv4sha6Pg3-8mqTPnCqFYw/edit

Use the format:
Name: record name
Description: record description
ID: record ID in the NDE
SEO abstract: 140-160 length generated result
Tweet abstract: 240-280 length generated result
3 sentence abstract: 3 sentence result
5 sentence abstract: 5 sentence result

Issue Discussion

This issue was discussed at the bi-weekly meeting dated 2024.09.04

Request Type

Examples (generate examples for evaluation, decision-making, etc.)

Material URL

https://docs.google.com/document/d/1pX0CTaDyQmH-XqvHX-l13ZKT0zB0MRKeHl-B-RTndSg/edit

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/13

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

Request status check list

  • The request has been scoped
  • Some immediate discussion or action on the matter was started, but additional investigations are underway
  • The investigation has been conducted
  • The results of the investigation have been made available
@ZubairQazi
Copy link

@gtsueng
Copy link
Collaborator Author

gtsueng commented Sep 10, 2024

@hartwickma, @lisa-mml, @rshabman, @sudvenk

We have generated examples of ChatGPT generated summaries. A selection from NIAID priority resources can be found here: https://docs.google.com/document/d/1KfJg5R-28mhUmcUwAHybuGv4sha6Pg3-8mqTPnCqFYw/edit

The Ask: we seek approval to initiate work on descriptive augmentation, starting with identifying the optimal summary length.

@gtsueng
Copy link
Collaborator Author

gtsueng commented Sep 26, 2024

Per discussions on 2024.09.23, we will forgo conducting user studies to determine the optimal length of generated summaries, and use the ~170 word count used by Scientific Data (as suggested by Lilliana). The summary should include 1-2 sentences detailing the method and experimental conditions. We will proceed with generating mock-ups that will visually prioritize summary information (but also indicates clearly that the summary information was generated using genAI). This information will be stored in a separate metadata field (while the original description field will be kept un-alterated).

@ZubairQazi can you edit the prompt according to the requirements: 170 words max; of which 1-2 sentences should detail method and experimental conditions (only if available, do not make it up if not available). Then, run the prompt on ClinEpiDB records as ClinEpiDB is likely to have longer description fields (with method/experimental condition info in description).

@ZubairQazi
Copy link

Generated summaries for ClinEpiDB and posted here:

https://docs.google.com/spreadsheets/d/1vtxtJrG4qSbrSlaqp_4RhtQw2jRUQXc38G3z7ZZFS3A/edit?usp=sharing

@gtsueng
Copy link
Collaborator Author

gtsueng commented Oct 17, 2024

@hartwickma, @sudvenk

The requested email draft for ClinEpiDB can be found in it's own issue here: https://github.com/NIAID-Data-Ecosystem/niaid-feedback/issues/161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants