UME HF Integration with ONNX #177

karinazad · 2025-08-01T10:17:46Z

Description

Add UME to HF

Type of Change

karinazad · 2025-08-01T10:18:21Z

src/lobster/model/integrations/ume_hugginface/configuration_ume.py

+    def __init__(
+        self,
+        model_name: Literal[
+            "ume-mini-base-12M", "ume-small-base-90M", "ume-medium-base-480M", "ume-large-base-740M"


avoid lobster imports and redefine this literal here

karinazad · 2025-08-01T14:00:28Z

examples/ume_hf_example.py

+
+    # Run inference
+    with torch.no_grad():
+        output = model(input_ids.unsqueeze(1), attention_mask.unsqueeze(1))


this unsqueeze here pretty ugly - I need to fix it in the tokenizer or onnx export directly

ncfrey · 2025-08-01T15:54:47Z

examples/ume_hf_example.py

+    )
+
+    # Example amino acid sequence
+    sequences = ["MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA"] * 2 


TODO: future PR, modify example to show how to dynamically detect modality and tokenize and embed appropriately

ncfrey · 2025-08-01T15:57:31Z

src/lobster/constants/_ume_models.py

 # TODO: currently, these will work for internal users
 # Support for external users will be added soon

 UME_CHECKPOINT_DICT_S3_BUCKET = "prescient-lobster"
 UME_CHECKPOINT_DICT_S3_KEY = "ume/checkpoints.json"
 UME_CHECKPOINT_DICT_S3_URI = f"s3://{UME_CHECKPOINT_DICT_S3_BUCKET}/{UME_CHECKPOINT_DICT_S3_KEY}"
+
+UME_MODEL_VERSIONS = [


Enum as allen suggested?

ncfrey · 2025-08-01T16:10:59Z

src/lobster/model/integrations/ume_huggingface/README.md

@@ -0,0 +1,29 @@
+# UME HuggingFace Integration


nice, can you link to this in the developer docs README?

ncfrey · 2025-08-01T16:11:23Z

src/lobster/model/integrations/ume_huggingface/configuration_ume.py

+
+    def __init__(
+        self,
+        model_name: Literal[


can we import this so it only has to be defined in one place?

ncfrey · 2025-08-01T16:12:48Z

src/lobster/model/integrations/ume_huggingface/export_to_onnx.py

+def export_ume_models_to_onnx():
+    for model_version in UME_MODEL_VERSIONS:
+        model = UME.from_pretrained(model_version)
+        model.export_onnx(HF_UME_MODEL_DIRPATH / f"{model_version}.onnx", modality=Modality.SMILES)


this will only do SMILES compatible models?

ncfrey · 2025-08-01T16:20:07Z

src/lobster/model/integrations/ume_huggingface/model/README.md

+| UME-medium | 480M | 24 | 1280 | 20 | High performance applications |
+| UME-large | 740M | 24 | 1600 | 25 | Best performance |
+
+All model sizes are optimized for GPU hardware efficiency following established best practices. Currently, all variants use the same model identifier. The default loaded model is UME-mini.


should we change this default to UME-medium?

ncfrey · 2025-08-01T16:21:55Z

src/lobster/model/integrations/ume_huggingface/model/README.md

+### Intra-Entity Modalities
+Different representations of the **same biological entity**:
+- Protein sequence → SMILES representation (chemical view of peptide)
+- DNA sequence → Amino acid sequence (central dogma)


need transcription & translation and reverse to claim central dogma here. could say "towards central dogma"

ncfrey · 2025-08-01T16:22:31Z

src/lobster/model/integrations/ume_huggingface/model/README.md

+- **Proteins**: AMPLIFY (360.7M), PeptideAtlas (4.2M)
+- **Small molecules**: ZINC (588.7M), M³-20M (20.8M)
+- **Nucleotides**: CaLM (8.6M)
+- **Structures**: PINDER (267K), PDBBind, ATOMICA, GEOM (1.17M)


how many samples in PDBBind and ATOMICA?

ncfrey · 2025-08-01T16:23:31Z

src/lobster/model/integrations/ume_huggingface/model/README.md

+
+### Capabilities & Limitations
+**Q: Can UME generate sequences?**
+- No - encoder-only model for representation learning


technically yes via GIbbs Sampling, primarily for infilling/inpainting, conditional generation

ncfrey · 2025-08-01T16:25:08Z

src/lobster/model/integrations/ume_huggingface/tokenization_ume.py

+from transformers import PreTrainedTokenizer
+
+# HuggingFace repository configuration
+HF_UME_REPO_ID = "karina-zadorozhny/ume"


import from constants?

hf model integration

9ea5e43

karinazad commented Aug 1, 2025

View reviewed changes

karinazad added 9 commits August 1, 2025 12:40

UME tokenizer handles all modalities

3a63aee

UME tokenizer handles all modalities

9b26d27

add hf hub

f1e6512

restore

9849d09

restore back

4e21d68

dont include tokenizer changes

d9b51ba

working HF integration, onnx issues

e769230

model card

9712dc7

tokenization restore

d5178af

karinazad changed the title ~~[Draft] UME HF Integration with ONNX~~ UME HF Integration with ONNX Aug 1, 2025

karinazad added 5 commits August 1, 2025 15:40

restore tests

9cca8fa

remove consts

8d20d21

ume restore

81376c0

ume restore

34df8c9

model card

10ef088

karinazad commented Aug 1, 2025

View reviewed changes

ncfrey reviewed Aug 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UME HF Integration with ONNX #177

UME HF Integration with ONNX #177

Uh oh!

karinazad commented Aug 1, 2025 •

edited

Loading

Uh oh!

karinazad Aug 1, 2025

Uh oh!

karinazad Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

ncfrey Aug 1, 2025

Uh oh!

Uh oh!

UME HF Integration with ONNX #177

Are you sure you want to change the base?

UME HF Integration with ONNX #177

Uh oh!

Conversation

karinazad commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karinazad commented Aug 1, 2025 •

edited

Loading