EurekaMD is an advanced medical prompting framework designed to enhance the accuracy and reasoning of LLMs in medical tasks. Building upon the MedPrompt framework, EurekaMD uses an LLM Judge to provide better selection of the strongest chain of thought reasoning, ultimately leading to a 93.3% performance score on the USMLE.
This repository provides the scripts and workflows to replicate EurekaMD's training and evaluation process.
Follow the steps below to configure your environment:
-
Install Python 3.10 or above:
-
Deploy gpt-4o via the Azure OpenAI Service: The scripts in this repo use the Azure OpenAI Service. Follow these instructions to deploy gpt-4o through the Azure OpenAI Service.
-
Set Up Environment Variables: After configuring the Azure OpenAI Service, set the following environment variables:
export AZURE_OPENAI_API_KEY="your-azure-openai-api-key" export AZURE_OPENAI_ENDPOINT_URL="your-azure-openai-endpoint-url"
-
Install Dependencies: Use
pipto install all required dependencies:pip install -r requirements.txt
This script finds similar questions in the training set for each question in the test set. The script uses OpenAI's text-embedding-3-large embedding model to calculate similarity.
python src/determine-similar-training-questions.pyThis script generates multiple reasoning paths for each question in the training set. The output file will contain multiple reasoning paths for each question in the training set.
python src/generate-candidate-reasoning-paths.pyThis script uses an LLM Judge to select the best reasoning path for each question in the training set.
python src/select-best-reasoning-paths.pyThe final script evaluates the accuracy of EurekaMD on the USMLE test set, using the reasoning paths selected by the previous script as the paths to use for the few-shot learning. The questions evaluated are those in the MedQA 4-options dataset.
python src/calculate-usmle-accuracy.py