Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MedicalCodingPipeline and SummarizationPipeline implementations #95

Draft
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

jenniferjiangkells
Copy link
Member

@jenniferjiangkells jenniferjiangkells commented Oct 31, 2024

Description

Implement MedicalCodingPipeline and SummarizationPipeline

Related Issue

#55

Changes Made

I come, once again, bearing breaking changes.

  • 💥 Changes to Document container class: ordered by sub-containers nlp, concepts, hl7, cds, models for better organisation. Each attribute is in charge of handling specific data handling, usually via getter and setter functions.

    • Changed .add_huggingface_output() etc to .add_output(integration_name, task, output) - easier to access and manage
    • Added models.get_generated_text() method,
  • Changes to CcdData: uses a ConceptLists dataclass to contain problems, medications, allergies concepts for better interface with the Document class.

  • Changes to .load() method for BasePipeline: this method now configures the pipeline with additional logic that parses a model and model source (either string - name of model or path to model or a callable - langchain chain object) into a ModelConfig object.

  • Added ModelRouter, a helper which returns the appropriate integration component given a ModelConfig

  • Templates: Users can pass in a Jinja template for custom CDS cards (this will extend to CDAs too, but that's a matter for a different issue).

  • Added CdsCardCreator: this component either extracts generated text from model outputs in the pipeline or takes in specified static content and parses this into a CDS Card object using Jinja templates (a default is used if not provided).

  • Renamed integration components to be more descriptive: SpacyComponent -> SpacyNLP, HuggingFaceComponent -> HFTransformer, LangchainComponent -> LangChainLLM

    • Also pass kwargs to integration components
  • Added ._add_concepts_to_hc_doc() helper method to SpacyNLP, which takes the entities from the the spacy doc and parses it to Concept and adds it to the .concepts attribute in Document. This is hard coded to always add new concepts as SNOMED Problems for now, but will be made configurable in future.

  • Removed default spacy tokenizer in TextPreprocessor: this is redundant as can just use SpacyNLP. For better separation of concern this component is just for very simple text preprocessing - the default is .split() but users can also pass in a tokenizer object (Callable) to use with the component.

  • And finally, added MedicalCodingPipeline and SummarizationPipeline implementation.

    • the pipeline does some internal coercion to make the task either ner or summarization, but no strict validation yet

Testing

Added tests for:

  • CdsCardCreator: test_card_creator.py
  • ModelRouter: test_modelrouter.py
  • pipeline .load() method: test_pipeline_load.py
  • Pipeline implementations: test_medicalcoding.py, test_summarization.py
  • check that kwargs are properly propagated in integration components: test_integrations.py
  • check that TextPreprocessor initializes tokenizer object - test_preprocessor.py
  • updated tests for Document methods - test_containers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants