This project extracts and semantically models drug-food interactions using data sourced from DrugBank. It processes natural language interaction descriptions, links terms to biomedical ontologies via BioFalcon, and generates a structured RDF-based Knowledge Graph of interactions, drugs, foods, effects, impacts, and recommendations.
-
Data Extraction
- The CSV file
drugBank_drug_food_interactions.csvcontains raw interaction descriptions from DrugBank. main.pyprocesses the CSV file and extracts relevant terms (drugs, foods, effects, impacts, interactions).extracting the Inter has more than one DFI.pyhandles cases where multiple DFIs are embedded in a single entry.
- The CSV file
-
Term Normalization
dictionary.pyis used to normalize extracted terms (e.g., converting "increased", "increasing" → "increase").
-
Entity Linking to UMLS
BioFalcon linking.pyuses BioFalcon to link each term to its UMLS Concept Unique Identifier (CUI).compare similarity.pyapplies fuzzy matching (fuzzywuzzy) to improve label alignment with UMLS terms.
-
Recommendation Extraction
recommendations.pyfilters out and extracts only the interaction texts that are explicit recommendations.
-
Semantic Mapping to RDF
- RDF/Turtle mapping files in the
Mapping/directory define rules to convert processed CSV files into RDF triples (.ntformat). - Output
.ntfiles represent the semantic Knowledge Graph, suitable for querying and reasoning.
- RDF/Turtle mapping files in the
Drug-Food-Interaction-main/
│
├── main.py # Extracts data from DrugBank CSV
├── extracting the Inter has more than one DFI.py # Handles multiple DFIs in one entry
├── dictionary.py # Normalizes terms to avoid duplicates
├── BioFalcon linking.py # Links terms to UMLS using BioFalcon
├── compare similarity.py # Matches terms using fuzzy similarity
├── recommendations.py # Extracts recommendation-based interactions
│
├── drugBank_drug_food_interactions.csv # Raw interaction data from DrugBank (downloaded on Feb 28, 2024)
│
├── Mapping/ # RDF mapping files and outputs
│ ├── *.ttl # Mapping templates (e.g., DrugMapping.ttl)
│ ├── *.nt # RDF output files
│ └── config.txt # Mapping configuration
│
├── error.log # Processing error logs
└── .idea/ # PyCharm IDE metadata (can be ignored)
- Python 3.7+
fuzzywuzzypandas- BioFalcon API Access
(Make sure to include.envor credentials if required for BioFalcon access.)
Install required packages:
pip install -r requirements.txtIf requirements.txt is missing, install manually:
pip install pandas fuzzywuzzy python-Levenshtein- Start by extracting interactions
python main.py- Process multiple-interaction entries
python "extracting the Inter has more than one DFI.py"- Normalize and prepare terms
python dictionary.py- Link terms with UMLS using BioFalcon
python "BioFalcon linking.py"- Refine matches using fuzzy similarity
python "compare similarity.py"- Extract only recommendation-based interactions
python recommendations.py- Generate RDF triples with mappings
Use SDM-RDFizer or similar tools to apply .ttl mapping files and produce .nt RDF outputs.
After processing, RDF triples representing drugs, foods, effects, impacts, and their interactions will be available in .nt format under the Mapping/ folder. These triples can be used for semantic reasoning, knowledge graph exploration, or querying with SPARQL.
- DrugBank: https://go.drugbank.com/
- BioFalcon: https://labs.tib.eu/sdm/biofalcon
- UMLS Metathesaurus: https://www.nlm.nih.gov/research/umls/index.html
- SDM-RDFizer: https://github.com/SDM-TIB/SDM-RDFizer
This work was developed as part of the P4-LUCAT project, within a research workflow for semantic enrichment of biomedical data.