Skip to content

DaneshjouLab/AutoGKB

Repository files navigation

Build and Test

AutoGKB

Goals:

  1. Fetch annotated articles from variantAnnotations stored in PharmGKB API
  2. Create a general benchmark for an extraction system that can output a score for an extraction system Given: Article, Ground Truth Variants (Manually extracted and recorded in var_drug_ann.tsv:) Input: Extracted Variants Output: Score
  3. System for extracting drug related variants annotations from an article. Associations in which the variant affects a drug dose, response, metabolism, etc.
  4. Continously fetch new pharmacogenomic articles

Description

This repository contains Python scripts for running and building a Pharmacogenomic Agentic system to annotate and label genetic variants based on their phenotypical associations from journal articles.

Dependencies

We manage a few repos externally:

  • PubMed Downloader: This repo is used to download all the markdown files from the PMIDs represented in var_drug_ann.tsv
  • Huggingface/AutoGKB: This converts the annotations and article text into a dataset format for benchmarking

Progress Tracker

Category Task Status
Initial Download Download the zip of variants from pharmgkb
Get a PMID list from the variants tsv (column PMID)
Convert the PMID to PMCID
Update to use non-official pmid to pmcid (aaron's method)
Fetch the content from the PMCID
Benchmark Create pairings of annotations to articles
Create a niave score of number of matches
Create group wise score
Look into advanced scoring based on distance from truth per term
Workflows Integrate Aaron's current approach
Document on individual annotation meanings
Delegate annotation groupings to team members
New Article Fetching Replicate PharGKB current workflow

System Overview

Annotations Diagram

Downloading the data

pixi run gdown —-id 1qtQWvi0x_k5_JofgrfsgkWzlIdb6isr9
unzip autogkb-data.zip

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •