Skip to content

Add test coverage for UME #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

etherealsunshine
Copy link
Contributor

@etherealsunshine etherealsunshine commented Jul 30, 2025

Description

Improves test coverage for the UME model class. Addresses #80

What's added

  • test_malformed_smiles_behavior - Documents how UME handles invalid SMILES strings
  • test_malformed_protein_behavior - Tests UME's robustness with invalid protein sequences

Features

  • Edge case documentation: Tests 11 types of malformed SMILES (unclosed parentheses, invalid characters, empty strings, etc.)
  • Protein sequence validation: Tests 10 types of invalid protein inputs (invalid amino acids, special characters, mixed case, etc.)

Usage

# Run the new robustness tests
pytest tests/lobster/model/test__ume.py::TestUME::test_malformed_smiles_behavior -v
pytest tests/lobster/model/test__ume.py::TestUME::test_malformed_protein_behavior -v

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring

Testing

  • Tests pass locally
  • Added new tests for new functionality
  • Updated existing tests if needed

Checklist

  • Code follows style guidelines
  • Self-review completed
  • Documentation updated if needed
  • No breaking changes (or clearly documented)

@etherealsunshine
Copy link
Contributor Author

Hi @ncfrey, I've added some tests addressing #80, considering some of the issues were addressed in existing PR's and added RDKit to the list of dependencies since it had probably been missed out. Let me know if this looks good!

@karinazad
Copy link
Collaborator

karinazad commented Jul 30, 2025

I'm actually working on making Nathan's detect_modality the default option in the UME tokenizer if no modality is specified, so these malformed inputs will be a good stress test for that as well

@etherealsunshine
Copy link
Contributor Author

let me know if these changes look good!

@etherealsunshine etherealsunshine changed the title Add test coverage for UME, Add test coverage for UME Aug 4, 2025
@etherealsunshine etherealsunshine requested a review from ncfrey August 4, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants