Skip to content

added support for tabular as well as new rdkit functions #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 25, 2025

Conversation

acquayefrank
Copy link
Contributor

Describe your changes

Added functionality for converting from inchi to formula and from smiles to formula. This is based on this issue #161. I also added support to treat tabular data as TSV files based on this issue #160

Checklist before requesting a review

  • I have performed a self-review of my code

Note:

I bumped the version to 0.5.0 instead of 0.4.2 because recent changes, like adding inchi_to_formula and smiles_to_formula, introduced new features rather than bug fixes.

Per [Semantic Versioning](https://semver.org/):

  • Patch version (e.g., 0.4.2) is for bug fixes and backwards-compatible changes.
  • Minor version (e.g., 0.5.0) is for adding new, backwards-compatible features.

Since this PR includes a new feature, increasing the minor version is appropriate.

@acquayefrank acquayefrank requested a review from Copilot June 9, 2025 12:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces functionality for converting from InChI and SMILES to molecular formulas using RDKit and adds support for handling tabular files (TSV) in addition to CSV and XLSX.

  • Added new conversion methods (smiles_to_formula and inchi_to_formula) in the RDKit converter.
  • Updated tests to include the new conversion methods.
  • Modified file format handling in DataFrame, Spectra, and app modules, and updated dependencies/version information.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_rdkit.py Added tests for the new smiles_to_formula and inchi_to_formula methods.
pyproject.toml Updated version and dependencies, and switched to Ruff linting.
MSMetaEnhancer/libs/utils/Errors.py Renamed UnknownSpectraFormat to UnknownFileFormat.
MSMetaEnhancer/libs/data/Spectra.py Updated exception usage to UnknownFileFormat.
MSMetaEnhancer/libs/data/DataFrame.py Added support for 'tabular' file format alongside tsv.
MSMetaEnhancer/libs/converters/compute/RDKit.py Added new RDKit functions for formula conversion from SMILES/InChI.
MSMetaEnhancer/app.py Updated data loading file format list and exception usage.
CHANGELOG.md Documented new features and dependency updates.
.pre-commit-config.yaml Added new pre-commit hooks and configured Python 3.12 environment.
Comments suppressed due to low confidence (3)

MSMetaEnhancer/libs/converters/compute/RDKit.py:80

  • Consider adding unit tests for invalid SMILES input to verify that the conversion correctly returns an empty formula when mol is None.
mol = MolFromSmiles(smiles)

MSMetaEnhancer/libs/converters/compute/RDKit.py:95

  • Consider adding unit tests for invalid InChI input to ensure consistent behavior when the molecule conversion fails, returning an empty formula.
mol = MolFromInchi(inchi)

MSMetaEnhancer/libs/data/DataFrame.py:15

  • Update the docstring to explicitly mention that 'tabular' is accepted as an alias for 'tsv' to ensure clarity for future maintainers.
Supported formats: csv, tsv/tabular, xlsx

Copy link
Member

@hechth hechth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments :)

matchms = ">=0.28.2"
matchms = ">=0.30.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not work with older versions of matchms? I'd be cautious of the matchms 0.30 pin as this means forcing numpy 2 - or does it not work with 0.28.2 anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as far as tests are concerned, they work with 0.30.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they also work on 0.28.2?

@hechth hechth merged commit 5eac8c9 into RECETOX:main Jun 25, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RDKit conversion functions Add tabular as alias for tsv to supported datatypes for loading and writing.
2 participants