Skip to content

Conversation

@MiguelAFH
Copy link
Collaborator

This PR adds a new scenario, NoteSummaryScenario, developed in partnership with Epic Systems. The scenario focuses on generating clinical note summaries for the Emergency Medicine specialty, reflecting real-world medical documentation needs.

To assess the quality of the model-generated summaries, we adopt the "LLM as a judge" evaluation framework based on PDSQI-9, a rubric co-developed by UW Madison and Epic Systems for systematic evaluation that doesn't require a gold standard response to compare against.

@MiguelAFH MiguelAFH requested a review from yifanmai August 4, 2025 20:33
@MiguelAFH MiguelAFH self-assigned this Aug 4, 2025
openpyxl~=3.1
python-docx~=1.1
transformers~=4.45,<4.50
evaluation-instruments @ git+https://github.com/epic-open-source/evaluation-instruments.git@1c4637e84fe4dc54f6695e438f3baca6b2cd4573
Copy link
Collaborator

@yifanmai yifanmai Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyPI packages cannot depend on packages outside PyPI. You should instead provide instructions to users to manually install this this package, either by printing the installation command in an error message, or by documenting it in ReadTheDocs.

from helm.benchmark.annotation.model_as_judge import AnnotatorModelInfo, LLMAsJuryAnnotator
from helm.clients.auto_client import AutoClient

from evaluation_instruments import prep
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display an error if this is not installed:

from helm.common.optional_dependencies import OptionalDependencyNotInstalled

try:
    from evaluation_instruments import prep
    import evaluation_instruments.instruments.pdsqi_9.pdsqi_prompt as pdsqi
except ModuleNotFoundError as e:
    # Provide manual instructions for installing evaluation-instruments from GitHub
    # because PyPI does not allow installing dependencies directly from GitHub.
    raise OptionalDependencyNotInstalled(
        f"Optional dependency {e.name} is not installed. "
        "Please run `evaluation-instruments @ git+https://github.com/epic-open-source/evaluation-instruments.git@1c4637e84fe4dc54f6695e438f3baca6b2cd4573` to install it."
    ) from e  # noqa: E501

def get_note_summary_spec(config_path: Optional[str] = None) -> RunSpec:
if config_path is None:
package = "helm.benchmark.scenarios"
config_path = str(pkg_resources.files(package).joinpath("note_summary_scenario.yaml"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add *.yaml to the manifest, or this file will not actually get included in the package.

recursive-include src/helm/benchmark/ *.json


return instances

def read_file(self, file_path: str) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete unused method.



@dataclass
class Rubric:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not look at the rubric logic too closely, but let me know if there's anything you want me to check.

@yifanmai
Copy link
Collaborator

Ping - last activity was two weeks ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants