+ "description": "This Python script performs a comprehensive spatial transcriptomics analysis using the `omicverse` library along with `scanpy`, `pandas`, `sklearn`, and `matplotlib`. It integrates spatial data, gene expression, and intercellular communication analysis to identify key cell-cell interactions and biological flows. Here's a breakdown of the functionality and structure:\n\n**Overall Workflow:**\n\n1. **Data Loading and Preprocessing:**\n * Loads Visium spatial data.\n * Performs basic quality control (QC) filtering on genes.\n * Selects spatially variable genes (SVGs).\n2. **Intercellular Communication Analysis:**\n * Loads a ligand-receptor database (CellChat).\n * Filters the database based on genes present in the dataset.\n * Performs spatial communication analysis using CellChat.\n * Analyzes communication direction for a specified pathway (e.g., FGF).\n * Visualizes cell communication patterns.\n3. **Integration of Annotation Data:**\n * Loads a ground truth annotation file.\n * Adds annotation data to the AnnData object.\n * Visualizes spatial data with annotations.\n4. **Biological Flow Analysis:**\n * Constructs gene expression modules (GEMs) using non-negative matrix factorization (NMF).\n * Extracts top genes for a specific GEM.\n * Constructs cellular flows from communication data.\n * Determines informative variables in the flow data.\n * Performs KMeans clustering on spatial coordinates.\n * Learns intercellular flows, validates them, and filters low-confidence edges.\n * Constructs the intercellular flow network.\n5. **Visualization and Output:**\n * Visualizes the GEM expression by cell type.\n * Visualizes the intercellular flow network.\n * Saves intermediate and final results to files.\n\n**Line-by-Line Explanation:**\n\n* **Lines 1-3:** Imports the required libraries: `omicverse` (as `ov`) and `scanpy` (as `sc`).\n* **Line 5:** Sets the plotting style using `omicverse`.\n* **Line 7:** Loads Visium spatial data into an `AnnData` object named `adata` using `scanpy.read_visium`.\n* **Line 8:** Ensures variable names (gene names) are unique in the `adata` object using `adata.var_names_make_unique()`.\n* **Line 10:** Calculates QC metrics for each cell in place in the `adata` object using `scanpy.pp.calculate_qc_metrics`.\n* **Line 11:** Filters the genes (variables) in `adata`, retaining only those with total counts greater than 100.\n* **Line 12:** Performs spatial variable gene selection using `omicverse.space.svg`, saving the result to `adata`.\n* **Line 13:** Displays the `adata` object.\n* **Line 15:** Writes the `adata` object to a compressed `.h5ad` file.\n* **Lines 19-21:** Loads a human ligand-receptor database using `omicverse.externel.commot.pp.ligand_receptor_database` and prints its shape.\n* **Lines 23-26:** Filters the ligand-receptor database to keep only interactions involving genes present in `adata` and prints the shape of the filtered dataframe.\n* **Lines 28-35:** Performs spatial communication analysis using `omicverse.externel.commot.tl.spatial_communication`, storing the result in `adata`.\n* **Lines 36-37:** Imports the `pandas` library (as `pd`) and the `os` library.\n* **Line 38:** Loads a ground truth annotation file into a pandas `DataFrame`, setting the index and column names.\n* **Line 39:** Assigns a column name 'Ground_Truth' to the annotation DataFrame.\n* **Line 40:** Adds the ground truth annotations to the `adata.obs` DataFrame, matching on cell IDs.\n* **Line 41:** Defines a list of colors to be used for plotting.\n* **Line 43:** Generates a spatial plot of the data colored by ground truth annotations using `scanpy.pl.spatial`.\n* **Line 45:** Creates a dictionary mapping ground truth categories to their corresponding colors.\n* **Line 47:** Prints the head of the ligand-receptor information stored within the `adata` object.\n* **Line 49:** Imports the `matplotlib.pyplot` library as `plt`.\n* **Lines 50-52:** Sets parameters for the spatial communication analysis (scale, neighborhood size, target pathway).\n* **Line 53:** Performs communication direction analysis for the specified pathway.\n* **Lines 54-62:** Visualizes cell communication for the specified pathway.\n* **Line 63:** Sets the title of the communication visualization plot.\n* **Line 67:** Writes the updated `adata` object to a compressed `.h5ad` file.\n* **Line 69:** Reads the h5ad file back into the `adata` object.\n* **Line 70:** Displays the `adata` object.\n* **Line 72:** Creates a new layer named 'normalized' in the AnnData object by copying the data from `adata.X`.\n* **Lines 74-79:** Constructs gene expression modules using NMF.\n* **Line 80:** Sets the target gene expression module for further analysis.\n* **Lines 81-87:** Extracts the top genes from the selected GEM module using `omicverse.externel.flowsig.ul.get_top_gem_genes`.\n* **Line 88:** Displays the top genes for the selected GEM module.\n* **Line 90:** Defines a commot output key, which is the commot-cellchat output.\n* **Lines 91-98:** Constructs cellular flows from commot output.\n* **Lines 99-108:** Determines informative variables in the flow data.\n* **Line 109:** Imports the `KMeans` class from sklearn.\n* **Line 111:** Performs KMeans clustering on spatial coordinates.\n* **Line 112:** Adds the spatial KMeans clustering labels to the adata.obs.\n* **Lines 115-121:** Learns intercellular flows using `ov.externel.flowsig.tl.learn_intercellular_flows`.\n* **Lines 123-128:** Applies biological flow validation using `ov.externel.flowsig.tl.apply_biological_flow`.\n* **Line 129:** Sets the threshold to filter low-confidence edges in the network.\n* **Lines 131-136:** Filters low-confidence edges using `ov.externel.flowsig.tl.filter_low_confidence_edges`.\n* **Line 137:** Writes the `adata` object to a compressed h5ad file.\n* **Line 141:** Constructs the intercellular flow network from the adata object.\n* **Line 144:** Sets the flowsig expression key.\n* **Line 145:** Retrieves the expression data associated with the flow key.\n* **Line 146:** Creates a new AnnData object from the expression data.\n* **Line 147:** Assigns the observations from adata to adata\\_subset.\n* **Line 148:** Rename variable names using a GEM naming convention.\n* **Line 151:** Imports the matplotlib plotting library.\n* **Line 152-153:** Creates a dotplot of GEM expression by ground truth, with specified parameters.\n* **Line 154:** Creates a dictionary mapping ground truth categories to colors for the dotplot.\n* **Line 156:** Plots the flowsig network.\n\n**Key Libraries:**\n\n* **`omicverse`:** A library for multi-omics data analysis, including spatial omics, with functionality for spatial gene selection, intercellular communication analysis, and biological flow analysis.\n* **`scanpy`:** A popular library for single-cell RNA-seq analysis, used here for loading, preprocessing, and visualizing spatial transcriptomics data.\n* **`pandas`:** Used for data manipulation, primarily loading the ground truth annotation file.\n* **`sklearn`:** Used for KMeans clustering.\n* **`matplotlib`:** Used for general purpose plotting.\n\n**In summary,** this script performs a detailed analysis of spatial transcriptomics data, combining gene expression, spatial information, intercellular communication, and biological flow to identify meaningful patterns and relationships within the tissue. It utilizes several libraries for this purpose, demonstrating a common approach to these kinds of analyses.",
0 commit comments