Modern biology constantly mixes identifiers from different years, databases, and genome builds. The result is a familiar set of problems: IDs disappear, symbols change, references disagree, and “the same gene” isn’t always represented the same way across datasets.
IDTrack is built for that reality. It provides a time-aware, audit-friendly way to translate and harmonize biological identifiers across Ensembl releases and across external namespaces (HGNC, UniProt, RefSeq, Entrez, …), while keeping ambiguity explicit instead of silently forcing a single answer.
- Time-aware mapping: treat Ensembl releases as a “time axis” and travel forward/backward through identifier history.
- Assembly-aware mapping: harmonize identifiers across genome builds (e.g. GRCh37 ↔ GRCh38) and respect external databases that are assembly-scoped.
- Snapshot boundary for reproducibility: build a release-bounded graph snapshot so results are stable and repeatable.
- Explicit external database opt-in: choose which external namespaces participate via a small, editable YAML contract.
- Transparency over coercion: conversions are naturally classified as 1→0 (no match), 1→1 (clean), or 1→n (ambiguous).
- Scale-ready workflows: caching and snapshot reuse make repeated conversions and multi-dataset harmonization practical.
- Wet-lab researchers who need a reliable, step-by-step path from “my gene list is old” to “my analysis is reproducible”.
- Bioinformaticians who want release-pinned, auditable conversions in notebooks, pipelines, and integration workflows.
- Atlas builders / integrators who need to harmonize gene identifiers across many cohorts (different Ensembl releases, symbols, and external IDs), keep an explicit audit trail of what mapped/failed/was ambiguous, and ship a release-pinned, reproducible feature space for downstream integration and publication.
- Dataset harmonization before integration (single-cell, bulk, atlas-scale collections).
- Legacy data rescue (old Ensembl releases, mixed symbols/IDs, retired identifiers).
- Publication-grade reproducibility (pin a snapshot boundary + share the exact external configuration).
- Cross-database interoperability when collaborators use different identifier conventions.
The documentation includes a full tutorial suite designed to be the primary learning resource:
- Documentation: Documentation
- Tutorials: start from the “Tutorials” section in the docs (Part 0 → Part 7).
