Skip to content

Conversation

@sronaghi
Copy link

@sronaghi sronaghi commented Oct 22, 2025

I provide the code for testing a new inference-time approach which involves combining general and clinical domain LMs for some private MedHelm tasks.

I want to test my method on CLEAR, PatientInstruct, and NoteExtract.

To run the models, it involves downloading the following models locally and changing the model paths at the top of the proxy_tuning_client.py file. I can provide a script to download into carina as well. Here are the models and places for download:

Below are the model configurations and the amount of A100 40GB GPUs they use each:

  • llama-70b-chat_none_none_1.0_logits_20 - 2 GPUs.
  • mellama-70b-chat_none_none_1.0_logprobs_20 - 2 GPUs.
  • mellama-13b-chat_none_none_1.0_logprobs_20 - 1 GPU.
  • qwen3-30b_none_none_1.0_logprobs_20 - 1 GPU.
  • qwen3-30b_mellama-13b-chat_llama-13b-base_1.0_logprobs_20 - 1 GPU.
  • llama-70b-chat_mellama-13b-chat_llama-13b-base_1.0_logits_20 - 3 GPUs.
  • qwen3-30b_mellama-13b-chat_none_1.0_logprobs_20 - 1 GPU.

I have added each model configuration to model_metadata.yaml, model_deployments.yaml, and tokenizer_config.yaml files in both prod_env and src/helm/config. run_entries_medhelm_private_proxy_tuning.conf contains the model run entries for each task. I can also create separate conf files based on amount of GPUs needed.

Each model for each task takes me ~7-22 hours each. I run the models with -n = 1 flag as my code doesn't support multi-threading.

I ended up using basic_summarization_metrics because I couldn't configure what was needed in my helm_env while maintaining compatibility with my code. If there are conda environment issues, I can share my env file and the modified run_specs.

@sronaghi
Copy link
Author

@@ -0,0 +1,192 @@
# MedHELM RunSpecs for the private benchmarks from Stanford.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifanmai what are your thoughts on adding this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@sronaghi sronaghi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made edits based on @MiguelAFH's comments.

@MiguelAFH MiguelAFH requested a review from yifanmai October 23, 2025 20:07
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general:

  • The files need more documentation, which can be placed as a module level docstring in proxy_tuning_client.py, in the comment in model_metadata.yaml andmodel_deployments.yaml, and in the comment on top of run_entries_medhelm_private_proxy_tuning.conf.
  • If this is experimental code, rather than intended for general use, your documentation should clearly say so.
  • Please run the linter:
pip install black==24.3.0 mypy==1.16.0 flake8==5.0.4
./pre-commit.sh

I did not look at your model code too closely, let me know if there's any specific things you would like me to look at.

@sronaghi
Copy link
Author

@yifanmai @MiguelAFH @aunell @suhana13 @HennyJie I ran the formatting check and added documentation. Please let me know what else to do for this PR!

@MiguelAFH MiguelAFH self-requested a review October 31, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants