A model that predicts Social Determinants of Health for the paper "Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction" (see arXiv).
See the full documentation. The API reference is also available.
- Create a new Conda environment:
conda env create -f src/python/environment-lock.yml
- Activate it:
activate sdoh
- Download the corpus
- Rename the corpus:
cd download mv mimic-iii-clinical-care-database-1.0.1.zip annotation-dataset-of-social-determinants-of-health-from-mimic-iii-clinical-care-database-1.0.1.zip
Optionally install the Lituiev et al. (2023) SDoH Models.
- Clone the spaCy package:
git clone https://github.com/BCHSI/social-determinants-of-health-clbp
- Change working directory:
cd social-determinants-of-health-clbp/model-hybrid-bow/package/en_sdoh_bow-0.0.2
- Build the wheel:
pip install wheel ; python setup.py bdist_wheel
- Install the wheel:
pip install --no-deps dist/en_sdoh_bow-0.0.2-py3-none-any.whl
- Install dependencies:
pip install lemma_tokenizer
- Install NLTK's dependencies:
pip install nltk ; python -c "import nltk ; nltk.download('stopwords') ; nltk.download('averaged_perceptron_tagger') ; nltk.download('wordnet')"
- Clone the model repo:
git clone https://github.com/BCHSI/social-determinants-of-health-clbp
- Change working directory:
cd social-determinants-of-health-clbp/model-cnn-ner/packages/en_sdoh_cnn_ner_cui-0.0.0
- Build the wheel:
pip install wheel ; python setup.py bdist_wheel
- Install the wheel:
( cd dist ; pip install en_sdoh_cnn_ner_cui-0.0.0-py3-none-any.whl )
The HuggingFace 2023 model is for an old version of spaCy (3.2). This converts
it to a HuggingFace model from a spaCy source model. It needs a previous
version of pip, so install an old version, install the spaCy model, then
restore pip. Then the PyTorch model is converted to sdoh-roberta-base
.
- Remember the old version:
OLDVER=$(pip --version | awk '{print $2'})
- Compatible pip version for package:
pip install --upgrade pip==23
. - Install Git Large File System:
brew install git-lfs
- Download the model:
git clone https://huggingface.co/dlituiev/en_sdoh_roberta_cui
- Install it:
pip install --no-deps en_sdoh_roberta_cui/en_sdoh_roberta_cui-any-py3-none-any.whl
- Install dependencies:
pip install spacy-transformers
- The conversion script needs an older HF package:
pip install transformers==4.26
- Convert to the PyTorch model:
src/bin/topytorch.py
- Revert the pip version:
pip install --upgrade pip==${OLDVER}
- Cleanup:
rm -rf en_sdoh_roberta_cui
- Set the path to the configuration file:
export SDOHRC=etc/model.conf
- All and testing commands are given with the
harness
script. See the command line help:./harness -h
- Run the fewshot LLM tests:
for i in mimic synthetic mimthetic ; do ./harness fewshot $i done
- Supervise-fine tune the LLM models, then test
for i in mimic synthetic mimthetic ; do ./harness train $i done for i in mimic synthetic mimthetic ; do ./harness test $i done
- Train and ablation test the traditional deep learning models:
./harness binaryabl for i in mimic synthetic mimthetic ; do ./harness multilabelabl done
An extensive changelog is available here.
Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.
Copyright (c) 2025 Paul Landes