Medhelm Epic #3787

MiguelAFH · 2025-08-04T20:33:54Z

This PR adds a new scenario, NoteSummaryScenario, developed in partnership with Epic Systems. The scenario focuses on generating clinical note summaries for the Emergency Medicine specialty, reflecting real-world medical documentation needs.

To assess the quality of the model-generated summaries, we adopt the "LLM as a judge" evaluation framework based on PDSQI-9, a rubric co-developed by UW Madison and Epic Systems for systematic evaluation that doesn't require a gold standard response to compare against.

yifanmai · 2025-08-07T20:57:09Z

setup.cfg

    openpyxl~=3.1
    python-docx~=1.1
    transformers~=4.45,<4.50
+    evaluation-instruments @ git+https://github.com/epic-open-source/evaluation-instruments.git@1c4637e84fe4dc54f6695e438f3baca6b2cd4573


PyPI packages cannot depend on packages outside PyPI. You should instead provide instructions to users to manually install this this package, either by printing the installation command in an error message, or by documenting it in ReadTheDocs.

yifanmai · 2025-08-07T21:58:14Z

src/helm/benchmark/annotation/note_summary_annotator.py

+from helm.benchmark.annotation.model_as_judge import AnnotatorModelInfo, LLMAsJuryAnnotator
+from helm.clients.auto_client import AutoClient
+
+from evaluation_instruments import prep


Display an error if this is not installed:

from helm.common.optional_dependencies import OptionalDependencyNotInstalled try: from evaluation_instruments import prep import evaluation_instruments.instruments.pdsqi_9.pdsqi_prompt as pdsqi except ModuleNotFoundError as e: # Provide manual instructions for installing evaluation-instruments from GitHub # because PyPI does not allow installing dependencies directly from GitHub. raise OptionalDependencyNotInstalled( f"Optional dependency {e.name} is not installed. " "Please run `evaluation-instruments @ git+https://github.com/epic-open-source/evaluation-instruments.git@1c4637e84fe4dc54f6695e438f3baca6b2cd4573` to install it." ) from e # noqa: E501

yifanmai · 2025-08-07T22:01:54Z

src/helm/benchmark/run_specs/medhelm_run_specs.py

+def get_note_summary_spec(config_path: Optional[str] = None) -> RunSpec:
+    if config_path is None:
+        package = "helm.benchmark.scenarios"
+        config_path = str(pkg_resources.files(package).joinpath("note_summary_scenario.yaml"))


You need to add *.yaml to the manifest, or this file will not actually get included in the package.

helm/MANIFEST.in

Line 3 in 89001e7

recursive-include src/helm/benchmark/ *.json

yifanmai · 2025-08-07T22:08:22Z

src/helm/benchmark/scenarios/note_summary_scenario.py

+
+        return instances
+
+    def read_file(self, file_path: str) -> List[str]:


Delete unused method.

yifanmai · 2025-08-07T22:10:11Z

src/helm/benchmark/metrics/llm_jury_metrics.py

+
+
+@dataclass
+class Rubric:


I did not look at the rubric logic too closely, but let me know if there's anything you want me to check.

yifanmai · 2025-08-22T22:24:34Z

Ping - last activity was two weeks ago.

MiguelAFH added 14 commits July 21, 2025 18:07

progress

edc2b9b

Update

ac4884e

Update

08f1b4e

Added checks for jury response

7229ad7

Update

22f526d

Merge branch 'main' into medhelm-epic

a25d578

Add evaluation-instruments module

ae6ad94

Merge branch 'main' into medhelm-epic

7b57073

Use PDSQI-9 code

662b0ee

Updated error support

c9e8498

Added support for config file

7ab8718

Updated summary scenario

07ccbb5

Added handling of non-numeric rubric values

94a3fe7

Merge branch 'main' into medhelm-epic

b7cf00b

MiguelAFH requested a review from yifanmai August 4, 2025 20:33

MiguelAFH self-assigned this Aug 4, 2025

MiguelAFH added the MedHELM label Aug 4, 2025

MiguelAFH added 9 commits August 4, 2025 20:36

Unbranched discharmge_scenario.py

d5e47e6

Update score weighing

c0f7c35

Fix lint

bf8bda0

Fix lint

c9d3bf9

Fix lint

49e0b51

Fix lint

5f39a8e

Fix lint

679d4dc

Updated NoteSummary description

fab6100

Fix lint

29607b7

yifanmai requested changes Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Medhelm Epic #3787

Medhelm Epic #3787

Uh oh!

MiguelAFH commented Aug 4, 2025

Uh oh!

yifanmai Aug 7, 2025 •

edited

Loading

Uh oh!

yifanmai Aug 7, 2025

Uh oh!

yifanmai Aug 7, 2025

Uh oh!

yifanmai Aug 7, 2025

Uh oh!

yifanmai Aug 7, 2025

Uh oh!

yifanmai commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		return instances

		def read_file(self, file_path: str) -> List[str]:



		@dataclass
		class Rubric:

Medhelm Epic #3787

Are you sure you want to change the base?

Medhelm Epic #3787

Uh oh!

Conversation

MiguelAFH commented Aug 4, 2025

Uh oh!

yifanmai Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifanmai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yifanmai Aug 7, 2025 •

edited

Loading