TF-MInDi is a Python package for analyzing transcription factor binding patterns from deep learning model attribution scores. It identifies and clusters sequence motifs from contribution scores, maps them to DNA-binding domains, and provides comprehensive visualization tools for regulatory genomics analysis.
- Seqlet Extraction: Identifies important sequence regions from contribution scores using recursive seqlet calling
- Motif Similarity Analysis: Compares extracted seqlets to known motif databases using TomTom
- Clustering & Dimensionality Reduction: Groups similar seqlets using Leiden clustering and t-SNE visualization
- DNA-Binding Domain Annotation: Maps seqlet clusters to transcription factor families
- Pattern Generation: Creates consensus motifs from clustered seqlets with alignment
- Comprehensive Visualization: Region-level contribution plots, t-SNE embeddings, motif logos, and heatmaps
- Scalable Processing: Memory-efficient chunked processing for large datasets
import tfmindi as tm
# Extract seqlets from contribution scores
seqlets_df, seqlet_matrices = tm.pp.extract_seqlets(
contrib=contrib_scores, # (n_examples, 4, length)
oh=one_hot_sequences, # (n_examples, 4, length)
threshold=0.05
)
# Calculate motif similarity
motif_collection = tm.load_motif_collection(
tm.fetch_motif_collection()
)
similarity_matrix = tm.pp.calculate_motif_similarity(
seqlet_matrices,
motif_collection,
chunk_size=10000
)
# Create AnnData object for analysis
adata = tm.pp.create_seqlet_adata(
similarity_matrix,
seqlets_df,
seqlet_matrices=seqlet_matrices,
oh_sequences=one_hot_sequences,
contrib_scores=contrib_scores,
motif_collection=motif_collection
)
# Cluster seqlets and annotate with DNA-binding domains
tm.tl.cluster_seqlets(adata, resolution=3.0)
# Generate consensus logos for each cluster
patterns = tm.tl.create_patterns(adata)
# Visualize results
tm.pl.tsne(adata, color_by="cluster_dbd")
tm.pl.region_contributions(adata, example_idx=0)
tm.pl.dbd_heatmap(adata)
You need to have Python 3.10 or newer installed on your system.
pip install tfmindi
TF-MInDi follows a scanpy-inspired workflow:
- Preprocessing (
tm.pp
): Extract seqlets, calculate motif similarities, and create an Anndata object - Tools (
tm.tl
): Cluster seqlets and create consensus patterns - Plotting (
tm.pl
): Visualize results
- Contribution scores: Attribution values from deep learning models (e.g., DeepSHAP, Integrated Gradients)
- One-hot sequences: Corresponding genomic sequences in one-hot encoding
- Motif database: Known transcription factor motifs
Please refer to the documentation for detailed tutorials and examples, in particular, the API documentation.
See the changelog.
If you found a bug, please use the issue tracker.
t.b.a