Skip to content

Conversation

@likhit2804
Copy link

Description
The current QM9 processing relies on sanitize=False to load the dataset, which allows "ghost bonds" (geometric artifacts > 1.87Å) and corrupts SMILES generation.

The Fix

  1. Added geometric bond filter (threshold > 1.85Å).
  2. Prioritize reading GDB-SMILES from properties over generating them from topology.

Fixes #10560

Added geometric filter to skip bonds exceeding a distance threshold and prioritized property-based SMILES retrieval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect data in QM9 dataset

1 participant