ζ₯ζ¬θͺη README γ―γγ‘γ
TSUMUGI (Trait-driven Surveillance for Mutation-based Gene module Identification) is a web tool that leverages knockout (KO) mouse phenotype data from the International Mouse Phenotyping Consortium (IMPC) to extract and visualize gene modules based on phenotypic similarity.
The tool is publicly available online for anyone to use ποΈ
π https://larc-tsukuba.github.io/tsumugi/
TSUMUGI derives from the Japanese concept of "weaving together gene groups that form phenotypes."
TSUMUGI supports three types of input:
When you input a phenotype of interest, TSUMUGI searches for gene groups with similar overall phenotype profiles among genes whose KO mice exhibit that phenotype.
Phenotype names are based on Mammalian Phenotype Ontology (MPO).
List of currently searchable phenotypes in TSUMUGI:
π Phenotype List
When you specify a single gene, TSUMUGI searches for other gene groups whose KO mice have similar phenotype profiles to that gene's KO mice.
Gene names follow gene symbols registered in MGI.
List of currently searchable gene names in TSUMUGI:
π Gene List
Accepts input of multiple genes.
Gene lists should be entered separated by line breaks.
Note
Gene List differs from single Gene input in that it extracts phenotypically similar genes among the genes within the list.
Caution
If no phenotypically similar genes are found,
No similar phenotypes were found among the entered genes.
alert will be displayed and processing will stop.
If phenotypically similar genes exceed 200,
Too many genes submitted. Please limit the number to 200 or fewer.
alert will be displayed and processing will stop to prevent browser overload.
You can download raw data of phenotypic similarity between gene pairs (in Gzip-compressed CSV format or Parquet format).
Contents include:
- Paired gene names (Gene1, Gene2)
- Phenotypic similarity between pairs (Jaccard Similarity)
- Number of shared phenotypes between pairs (Number of shared phenotype)
- List of shared phenotypes between pairs (List of shared phenotype)
Caution
File size is approximately 50-100MB. Download may take some time.
We recommend using Parquet format when working with Polars
or Pandas
.
You can load the data as follows:
# Install Polars and PyArrow using conda
conda create -y -n env-tsumugi polars pyarrow
conda activate env-tsumugi
# Load Parquet file using Polars
import polars as pl
df_tsumugi = pl.read_parquet("TSUMUGI_{version}_raw_data.parquet")
# Install Pandas and PyArrow using conda
conda create -y -n env-tsumugi pandas pyarrow
conda activate env-tsumugi
# Load Parquet file using Pandas
import pandas as pd
df_tsumugi = pd.read_parquet("TSUMUGI_{version}_raw_data.parquet")
Based on the input, the page transitions and the network is automatically drawn.
Important
Gene pairs with 2 or more shared abnormal phenotypes AND phenotypic similarity of 0.2 or higher are subject to visualization.
Each node represents one gene.
Clicking displays a list of abnormal phenotypes observed in that gene's KO mice.
You can freely adjust positions by dragging.
Clicking an edge shows details of shared phenotypes.
The left control panel allows you to adjust network display.
The Phenotypes similarity
slider allows you to set thresholds for gene pairs displayed in the network based on edge phenotypic similarity (Jaccard coefficient).
Similarity minimum and maximum values are converted to a 1-10 scale, allowing 10-level filtering.
Note
For details on phenotypic similarity, please see:
π π Calculation Method for Phenotypically Similar Gene Groups
The Phenotype severity
slider allows you to adjust node display based on phenotype severity (effect size) in KO mice.
Higher effect sizes indicate stronger phenotypic impact.
This also scales the effect size range to 1-10, allowing 10-level filtering.
Note
When IMPC phenotype evaluation is binary (present/absent) (e.g., abnormal embryo development: list of binary phenotypes available here) or when gene name is input, the Phenotypes severity
slider is not available.
You can specify the genotype of KO mice exhibiting phenotypes:
Homo
: Phenotypes seen in homozygous miceHetero
: Phenotypes seen in heterozygous miceHemi
: Phenotypes seen in hemizygous mice
You can extract sex-specific phenotypes:
Female
: Female-specific phenotypesMale
: Male-specific phenotypes
You can specify life stages when phenotypes appear:
Embryo
: Phenotypes appearing during embryonic stageEarly
: Phenotypes appearing at 0-16 weeks of ageInterval
: Phenotypes appearing at 17-48 weeks of ageLate
: Phenotypes appearing at 49+ weeks of age
You can highlight genes related to human diseases.
The relationship between KO mice and human diseases uses public data from IMPC Disease Models Portal.
You can search for gene names included in the network.
You can adjust the following elements:
- Network layout (layout)
- Font size (Font size)
- Edge thickness (Edge width)
- Distance between nodes (*Cose layout only) (Node repulsion)
You can export current network images and data in PNG, CSV and GraphML formats.
CSV includes connected component (module) IDs and lists of phenotypes shown by each gene's KO mice.
GraphML is a format compatible with the desktop version of Cytoscape, allowing you to import the network into Cytoscape for further analysis.
IMPC dataset uses statistical-results-ALL.csv.gz
from Release-23.0.
Information about columns included in the dataset: Data fields
Extract gene-phenotype pairs where KO mouse phenotype P-values (p_value
, female_ko_effect_p_value
, or male_ko_effect_p_value
) are 0.0001 or below.
- Genotype-specific phenotypes are annotated with
homo
,hetero
, orhemi
- Sex-specific phenotypes are annotated with
female
ormale
Jaccard coefficient is used as the phenotypic similarity metric.
This is a similarity measure that expresses the proportion of shared phenotypes as a 0-1 numerical value.
Jaccard(A, B) = |A β© B| / |A βͺ B|
For example, suppose gene A and gene B KO mice have the following abnormal phenotypes:
A: {abnormal embryo development, abnormal heart morphology, abnormal kidney morphology}
B: {abnormal embryo development, abnormal heart morphology, abnormal lung morphology}
In this case, there are 2 shared phenotypes and 4 total unique phenotypes, so the Jaccard coefficient is calculated as follows:
Jaccard(A, B) = 2 / 4 = 0.5
For questions or requests, please feel free to contact us:
-
Google Form
π Contact Form -
For GitHub account holders
π GitHub Issue