EurekaMD

Overview

EurekaMD is an advanced medical prompting framework designed to enhance the accuracy and reasoning of LLMs in medical tasks. Building upon the MedPrompt framework, EurekaMD uses an LLM Judge to provide better selection of the strongest chain of thought reasoning, ultimately leading to a 93.3% performance score on the USMLE.

This repository provides the scripts and workflows to replicate EurekaMD's training and evaluation process.

Installation

Follow the steps below to configure your environment:

Install Python 3.10 or above:
Deploy gpt-4o via the Azure OpenAI Service: The scripts in this repo use the Azure OpenAI Service. Follow these instructions to deploy gpt-4o through the Azure OpenAI Service.

Set Up Environment Variables: After configuring the Azure OpenAI Service, set the following environment variables:

export AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
export AZURE_OPENAI_ENDPOINT_URL="your-azure-openai-endpoint-url"

Install Dependencies: Use pip to install all required dependencies:
```
pip install -r requirements.txt
```

Running Scripts

Determine Similar Training Questions

This script finds similar questions in the training set for each question in the test set. The script uses OpenAI's text-embedding-3-large embedding model to calculate similarity.

To Run

python src/determine-similar-training-questions.py

Generate Candidate Reasoning Paths

This script generates multiple reasoning paths for each question in the training set. The output file will contain multiple reasoning paths for each question in the training set.

To Run

python src/generate-candidate-reasoning-paths.py

Select Best Reasoning Paths

This script uses an LLM Judge to select the best reasoning path for each question in the training set.

To Run

python src/select-best-reasoning-paths.py

Calculate USMLE Accuracy

The final script evaluates the accuracy of EurekaMD on the USMLE test set, using the reasoning paths selected by the previous script as the paths to use for the few-shot learning. The questions evaluated are those in the MedQA 4-options dataset.

To Run

python src/calculate-usmle-accuracy.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EurekaMD

Table of Contents

Overview

Installation

Running Scripts

Determine Similar Training Questions

To Run

Generate Candidate Reasoning Paths

To Run

Select Best Reasoning Paths

To Run

Calculate USMLE Accuracy

To Run

About

Uh oh!

Languages

eurekahealth/EurekaMD

Folders and files

Latest commit

History

Repository files navigation

EurekaMD

Table of Contents

Overview

Installation

Running Scripts

Determine Similar Training Questions

To Run

Generate Candidate Reasoning Paths

To Run

Select Best Reasoning Paths

To Run

Calculate USMLE Accuracy

To Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages