Skip to content

Commit

Permalink
Add pipeline framework (#61)
Browse files Browse the repository at this point in the history
* Added pipelines WIP

* Added pipeline and io components

* Added validation and tests

* Tidied up typing, added property utils to pipeline, updated tests

* Fix component name string in stages property

* Changed model name to be generic

* Added methods to data containers

* Add simple preprocessing and postprocessing components

* Update dependencies

* Remove print statement

* Fix preprocessor name

* Remove configs from pre and postprocessors

* Fix Discord link

* Update documentation

* Make pipeline wrapper callable method less verbose

* Fail removing/replacing non-existing components louder

* Update README.md

* Added built-in .build() when pipeline is first called

* Update docs with usage

* README.md

* README.md - link
  • Loading branch information
jenniferjiangkells authored Oct 4, 2024
1 parent cdfabb9 commit 032f07e
Show file tree
Hide file tree
Showing 49 changed files with 4,682 additions and 1,127 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,5 +160,7 @@ cython_debug/
#.idea/

output/
scrap/
.DS_Store
.vscode/
.ruff_cache/
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you're a developer, there are many ways you can contribute code:

## Join Our Discord

Are you a domain expert with valuable insights? We encourage you to join our [Discord community](https://discord.gg/4v6XgGBZ) and share your wisdom. Your expertise can help shape the future of the project and guide us in making informed decisions.
Are you a domain expert with valuable insights? We encourage you to join our [Discord community](https://discord.gg/UQC6uAepUz) and share your wisdom. Your expertise can help shape the future of the project and guide us in making informed decisions.

We believe that every contribution, big or small, makes a difference. Thank you for being a part of our community!

Expand Down
171 changes: 108 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,138 +10,183 @@

</div>

Simplify testing and evaluating AI and NLP applications in a healthcare context 💫 🏥.
Simplify developing, testing and validating AI and NLP applications in a healthcare context 💫 🏥.

Building applications that integrate in healthcare systems is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.
Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.

```bash
pip install healthchain
```
First time here? Check out our [Docs](dotimplement.github.io/HealthChain/) page!
First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain/) page!

## Features
- [x] 🍱 Create sandbox servers and clients that comply with real EHRs API and data standards.
- [x] 🗃️ Generate synthetic FHIR resources or load your own data as free-text.
- [x] 💾 Save generated request and response data for each sandbox run.
- [x] 🎈 Streamlit dashboard to inspect generated data and responses.
- [x] 🧪 Experiment with LLMs in an end-to-end HL7-compliant pipeline from day 1.
- [x] 🛠️ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
- [x] 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
- [x] 🧪 Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments
- [x] 🗃️ Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development
- [x] 🚀 Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)

## Why use HealthChain?
- **Scaling EHR integrations is a manual and time-consuming process** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
- **Evaluating the behaviour of AI in complex systems is a difficult and labor-intensive task** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
- **[Most healthcare data is unstructured](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/)** - HealthChain is optimised for real-time AI/NLP applications that deal with realistic healthcare data.
- **EHR integrations are manual and time-consuming** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
- **It's difficult to track and evaluate multiple integration instances** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
- [**Most healthcare data is unstructured**](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/) - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.

## Clinical Decision Support (CDS)
## Pipeline
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily interface with parsers and connectors to integrate with EHRs.

### Building a pipeline

```python
from healthchain.io.containers import Document
from healthchain.pipeline import Pipeline
from healthchain.pipeline.components import TextPreProcessor, Model, TextPostProcessor

# Initialize the pipeline
nlp_pipeline = Pipeline[Document]()

# Add TextPreProcessor component
preprocessor = TextPreProcessor(tokenizer="spacy")
nlp_pipeline.add(preprocessor)

# Add Model component (assuming we have a pre-trained model)
model = Model(model_path="path/to/pretrained/model")
nlp_pipeline.add(model)

# Add TextPostProcessor component
postprocessor = TextPostProcessor(
postcoordination_lookup={
"heart attack": "myocardial infarction",
"high blood pressure": "hypertension"
}
)
nlp_pipeline.add(postprocessor)

# Build the pipeline
nlp = nlp_pipeline.build()

# Use the pipeline
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))

print(f"Entities: {result.entities}")
```
### Using pre-built pipelines

```python
from healthchain.io.containers import Document
from healthchain.pipeline import MedicalCodingPipeline

# Load the pre-built MedicalCodingPipeline
pipeline = MedicalCodingPipeline.load("./path/to/model")

# Create a document to process
result = pipeline(Document("Patient has a history of myocardial infarction and hypertension."))

print(f"Entities: {result.entities}")
```

## Sandbox

Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.

### Clinical Decision Support (CDS)
[CDS Hooks](https://cds-hooks.org/) is an [HL7](https://cds-hooks.hl7.org) published specification for clinical decision support.

**When is this used?** CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.

**What information is sent**: the context of the event and FHIR resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.
**What information is sent**: the context of the event and [FHIR](https://hl7.org/fhir/) resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.

**What information is returned**: “cards” displaying text, actionable suggestions, or links to launch a [SMART](https://smarthealthit.org/) app from within the workflow.

**What you need to decide**: What data do I want my EHR client to send, and how will my service process this data.


```python
import healthchain as hc

from healthchain.pipeline import Pipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import DataGenerator

from healthchain.data_generator import CdsDataGenerator
from typing import List

# Decorate class with sandbox and pass in use case
@hc.sandbox
class myCDS(ClinicalDecisionSupport):
class MyCDS(ClinicalDecisionSupport):
def __init__(self) -> None:
self.data_generator = DataGenerator()
self.pipeline = Pipeline.load("./path/to/model")
self.data_generator = CdsDataGenerator()

# Sets up an instance of a mock EHR client of the specified workflow
@hc.ehr(workflow="patient-view")
def ehr_database_client(self) -> CdsFhirData:
self.data_generator.generate()
return self.data_generator.data
return self.data_generator.generate()

# Define your application logic here
@hc.api
def my_service(self, request: CdsRequest) -> List[Card]:
result = "Hello " + request["patient_name"]
return result

if __name__ == "__main__":
cds = myCDS()
cds.start_sandbox()
```

Then run:
```bash
healthchain run mycds.py
def my_service(self, data: CDSRequest) -> List[Card]:
result = self.pipeline(data)
return [
Card(
summary="Welcome to our Clinical Decision Support service.",
detail=result.summary,
indicator="info"
)
]
```
This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in `./output` by default.

## Clinical Documentation
### Clinical Documentation

The ClinicalDocumentation use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.
The `ClinicalDocumentation` use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.

**When is this used?** Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.

**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org/implement/standards/product_brief.cfm?product_id=7) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.

**What information is returned**: A CDA document which contains additional structured data extracted and returned by your CDI service.
**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org.uk/standards/hl7-standards/cda-clinical-document-architecture/) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.

```python
import healthchain as hc

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, ProblemConcept, Quantity,

@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
def __init__(self):
self.cda_path = "./resources/uclh_cda.xml"
self.pipeline = MedicalCodingPipeline.load("./path/to/model")

# Load an existing CDA file
@hc.ehr(workflow="sign-note-inpatient")
def load_data_in_client(self) -> CcdData:
with open(self.cda_path, "r") as file:
with open("/path/to/cda/data.xml", "r") as file:
xml_string = file.read()

return CcdData(cda_xml=xml_string)

# Define application logic
@hc.api
def my_service(self, ccd_data: CcdData) -> CcdData:
# Apply method from ccd_data.note and access existing entries from ccd.problems

new_problem = ProblemConcept(
code="38341003",
code_system="2.16.840.1.113883.6.96",
code_system_name="SNOMED CT",
display_name="Hypertension",
)
ccd_data.problems.append(new_problem)
return ccd_data
annotated_ccd = self.pipeline(ccd_data)
return annotated_ccd
```
### Running a sandbox

Ensure you run the following commands in your `mycds.py` file:

### Streamlit dashboard
Note this is currently not meant to be a frontend to the EHR client, so you will have to run it separately from the sandbox application.
```python
cds = MyCDS()
cds.run_sandbox()
```
This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the `./output` directory.

Then run:
```bash
pip install streamlit
streamlit streamlit-demo/app.py
healthchain run mycds.py
```

By default, the server runs at `http://127.0.0.1:8000`, and you can interact with the exposed endpoints at `/docs`.
## Road Map
- [x] 📝 Adding Clinical Documentation use case
- [ ] 🎛️ Version and test different EHR backend configurations
- [ ] 🤖 Integrations with popular LLM and NLP libraries
- [ ] ❓ Evaluation framework for pipelines and use cases
- [ ] 🎛️ Versioning and artifact management for pipelines sandbox EHR configurations
- [ ] 🤖 Integrations with other pipeline libraries such as spaCy, HuggingFace, LangChain etc.
- [ ] ❓ Testing and evaluation framework for pipelines and use cases
- [ ] 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
- [ ] ✨ Improvements to synthetic data generator methods
- [ ] 👾 Frontend demo for EHR client
- [ ] 👾 Frontend UI for EHR client and visualization features
- [ ] 🚀 Production deployment options

## Contribute
Expand Down
6 changes: 6 additions & 0 deletions docs/api/component.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Component

::: healthchain.pipeline.components.basecomponent
::: healthchain.pipeline.components.preprocessors
::: healthchain.pipeline.components.models
::: healthchain.pipeline.components.postprocessors
3 changes: 3 additions & 0 deletions docs/api/containers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Containers

::: healthchain.io.containers
3 changes: 3 additions & 0 deletions docs/api/pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Pipeline

::: healthchain.pipeline.basepipeline
1 change: 1 addition & 0 deletions docs/community/contribution_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Contribution Guide
1 change: 1 addition & 0 deletions docs/community/resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Resources
48 changes: 48 additions & 0 deletions docs/cookbook/cds_sandbox.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Build a CDS sandbox

A CDS sandbox which uses `gpt-4o` to summarise patient information from synthetically generated FHIR resources received from the `patient-view` CDS hook.

```python
import healthchain as hc

from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.data_generators import CdsDataGenerator
from healthchain.models import Card, CdsFhirData, CDSRequest

from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

from typing import List

@hc.sandbox
class CdsSandbox(ClinicalDecisionSupport):
def __init__(self):
self.chain = self._init_llm_chain()
self.data_generator = CdsDataGenerator()

def _init_llm_chain(self):
prompt = PromptTemplate.from_template(
"Extract conditions from the FHIR resource below and summarize in one sentence using simple language \n'''{text}'''"
)
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()

chain = prompt | model | parser
return chain

@hc.ehr(workflow="patient-view")
def load_data_in_client(self) -> CdsFhirData:
data = self.data_generator.generate()
return data

@hc.api
def my_service(self, request: CDSRequest) -> List[Card]:
result = self.chain.invoke(str(request.prefetch))
return Card(
summary="Patient summary",
indicator="info",
source={"label": "openai"},
detail=result,
)
```
7 changes: 6 additions & 1 deletion docs/cookbook/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
# Cookbook
# Examples

The best way to learn is by example! Here are some to get you started:

- [Build a CDS sandbox](./cds_sandbox.md): Build a clinical decision support (CDS) system that uses *patient-view* to greet the patient.
- [Build a Clinical Documentation sandbox](./notereader_sandbox.md): Build a NoteReader system which extracts problem, medication, and allergy concepts from free-text clinical notes.
Loading

0 comments on commit 032f07e

Please sign in to comment.