The GxP regulatory environment is very complex as different countries have their own regulations, and standardization is very limited. GxP regulations and guidance documents are thousands of pages of text files (pdf or HTML) posted in several internet locations. These regulatory requirements have to be manually parsed, analyzed, and classified to develop the J&J quality requirements. This is a time-consuming process.
- With fine-tuned GPT-3 model, classify requirements by quality topics and classify quality topics requirements into themes; summarize theme requirements into a J&J Quality requirement that meets all the regulations and guidance documents.
- Build metrics to evaluate the model and benchmark using other available large language models as well as traditional machine learning models.
Vishweshwar Tyagi (captain), Daoxing Zhang, Siqi He, Siwen Xie, Yihao Gao
Frank Janssens, Majd Mustapha
Adam Kelleher
Xuanyu Li
cd JNJ-Capstone-Project
conda env create -f environment.yml
conda activate capstone
pip install -e src/capstone
Including the optional -e flag will install the package in "editable" mode, meaning that instead of copying the files into your virtual environment, a symlink will be created to the files where they are.
python -m capstone fetch
python -m nltk.downloader all
jupyter notebook notebooks/
You can now use the jupyter kernel to run notebooks.
The notebooks may be viewed in the following order:
-
eda.ipynb - Exploratory Data Analysis
-
naive-model-evaluation.ipynb - Results from naive model which predicts the most common target (multi-label binarized vector) in the development set
-
baseline-evaluation.ipynb - Results from baseline Random Forest trained on TF-IDF features
-
bert_evaluation.ipynb - Results from fine-tuned BERT
-
ada-evaluation.ipynb - Results from fine-tuned Ada
-
curie-evaluation.ipynb - Results from fine-tuned Curie
-
davinci-evaluation.ipynb - Results from fine-tuned Davinci
-
ensemble-evaluation.ipynb - Results from ensemble of BERT, Ada, and Curie based on majority vote
-
bert_embeddings.ipynb - Evaluate embeddings of fine-tuned BERT against vanilla BERT on (unsupervised) clustering task (test dataset)
-
gpt3-embeddings-test-set.ipynb - Evaluate embeddings of vanilla GPT-3 models (Ada, Curie and Davinci) on (unsupervised) clustering task (test dataset)
-
gpt3-embeddings-whole-set.ipynb - Evaluate embeddings of vanilla GPT-3 models (Ada, Curie and Davinci) on (unsupervised) clustering task (whole dataset)
-
JnJ-Janssens_JnJ-3_Final_Report.pdf - Final report
-
JnJ-Janssens_JnJ-3_Poster.pdf - Poster