Skip to content

Commit b349f3d

Browse files
added agent
1 parent eda698d commit b349f3d

File tree

108 files changed

+4821
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+4821
-1
lines changed

OV_Agent

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"description": "This Python script performs a spatial transcriptomics analysis using the `omicverse` and `scanpy` libraries. It takes Visium spatial data, performs dimensionality reduction and clustering using several methods, and evaluates the results against known ground truth annotations. Here's a breakdown of its functionality and structure:\n\n**1. Setup and Data Loading:**\n\n* **Imports:** Imports necessary libraries: `omicverse` for spatial analysis, `scanpy` for handling spatial data, `pandas` for data manipulation, `os` for file path handling, `matplotlib` for plotting and `sklearn` for clustering metrics.\n* **Set Plotting Parameters:** Uses `ov.plot_set()` to configure plotting defaults from `omicverse`.\n* **Read Visium Data:** Reads spatial transcriptomics data from an H5 file using `sc.read_visium()`.\n* **Make Variable Names Unique:** Ensures gene names are unique within the AnnData object.\n* **Quality Control:** Calculates QC metrics (e.g., number of reads per cell) using `sc.pp.calculate_qc_metrics()`.\n* **Gene Filtering:** Filters out genes with low total counts (<= 100).\n* **Spatial Variable Gene Selection:** Selects spatially variable genes using `ov.space.svg()`, which is a key first step to reduce noise and focus on the important spatial variation of the data.\n* **Save and Reload AnnData:** Writes the processed AnnData object to an H5AD file and then reloads it. This step might be for saving progress or for clean data reload at next run of the script.\n* **Load Ground Truth Annotations:** Reads ground truth labels from a tab-separated file using `pd.read_csv()`. The ground truth labels are assumed to be available for each spot in the analysis.\n* **Add Ground Truth to AnnData:** Adds the ground truth labels as an observation (`.obs`) column in the `AnnData` object, matching labels with the spot names.\n* **Spatial Plot of Ground Truth:** Displays the spatial layout of the spots colored by the ground truth annotation.\n\n**2. GraphST Clustering:**\n\n* **Parameter Setup:** Creates a dictionary `methods_kwargs` to store parameters for different methods and sets up parameters for the `GraphST` method, which is one of the spatial dimensionality reduction method implemented in the `omicverse`.\n* **GraphST Dimensionality Reduction and Clustering:** Performs dimensionality reduction using GraphST `ov.space.clusters()`.\n* **Mclust Clustering on GraphST Representation:** Performs Gaussian mixture model clustering (`mclust`) on the reduced representation from GraphST.\n* **Label Refinement:** Refines cluster labels by smoothing them based on spatial proximity using `ov.utils.refine_label()`.\n* **Categorical Conversion:** Converts refined cluster labels to categorical data type.\n* **Cluster Merging:** Merges clusters based on a tree structure that is constructed based on the similarities of the initial clusters. The merge cluster method implemented by `ov.space.merge_cluster()` helps to refine and interpret clustering results.\n* **Spatial Plot of Clusters:** Displays spatial layouts of mclust clusters and merged mclust clusters.\n* **Mclust_R Clustering on GraphST Representation:** The same clustering pipeline with `mclust` (above) is repeated here with the mclust_R, which calls the `mclust` R package for clustering.\n\n**3. BINARY Clustering:**\n\n* **Parameter Setup:** Updates `methods_kwargs` with parameters for the `BINARY` method. BINARY is another spatial dimensionality reduction method available in `omicverse`.\n* **BINARY Dimensionality Reduction and Clustering:** Performs dimensionality reduction using BINARY `ov.space.clusters()`.\n* **Mclust_R Clustering on BINARY Representation:** Performs mclust using the `mclust` R package on the representation from the BINARY method.\n* **Label Refinement & Categorical Conversion:** Refines and converts mclust labels.\n* **Cluster Merging:** Merges clusters.\n* **Spatial Plot of Clusters:** Displays spatial plots of the clusters.\n* **Mclust Clustering on BINARY Representation:** Performs mclust using the Python implementation on the representation from the BINARY method.\n* **Label Refinement & Categorical Conversion:** Refines and converts mclust labels.\n* **Cluster Merging:** Merges clusters.\n* **Spatial Plot of Clusters:** Displays spatial plots of the clusters.\n\n**4. STAGATE Clustering:**\n\n* **Parameter Setup:** Updates `methods_kwargs` with parameters for the `STAGATE` method. `STAGATE` is another spatial dimensionality reduction method.\n* **STAGATE Dimensionality Reduction and Clustering:** Performs dimensionality reduction using STAGATE `ov.space.clusters()`.\n* **Mclust_R Clustering on STAGATE Representation:** Performs mclust clustering using the R package on the STAGATE representation.\n* **Label Refinement & Categorical Conversion:** Refines and converts mclust labels.\n* **Cluster Merging:** Merges clusters.\n* **Spatial Plot of Clusters:** Displays spatial plots of the clusters.\n* **Gene Visualization:** Visualizes the expression of the gene with highest PI value by `omicverse` and also the user specified `MBP` gene in raw and STAGATE transformed spaces.\n\n**5. CAST Clustering:**\n\n* **Parameter Setup:** Updates `methods_kwargs` with parameters for the `CAST` method. `CAST` is another spatial dimensionality reduction method.\n* **CAST Dimensionality Reduction and Clustering:** Performs dimensionality reduction using CAST `ov.space.clusters()`.\n* **Mclust Clustering on CAST Representation:** Performs mclust using the Python implementation on the CAST representation.\n* **Label Refinement & Categorical Conversion:** Refines and converts mclust labels.\n* **Cluster Merging:** Merges clusters.\n* **Spatial Plot of Clusters:** Displays spatial plots of the clusters.\n\n**6. Evaluation:**\n\n* **Calculate Adjusted Rand Index (ARI):** Calculates and prints the Adjusted Rand Index (ARI) to compare each clustering method's result to the ground truth annotation. This metric evaluates the consistency of cluster assignment relative to a ground-truth, taking into account the number of clusters and the number of samples in each clusters.\n* **Print ARI Results:** Prints the ARI results for each clustering method.\n\n**Key functionalities:**\n\n* **Spatial Analysis:** Utilizes `omicverse` to perform spatial-aware dimensionality reduction and clustering.\n* **Dimensionality Reduction:** Leverages methods like GraphST, BINARY, STAGATE and CAST to reduce high-dimensional gene expression data into a lower-dimensional representation while retaining important spatial information.\n* **Clustering:** Employs Gaussian mixture models via `mclust` and `mclust_R` for clustering.\n* **Cluster Refinement:** Smooths cluster assignments based on spatial proximity.\n* **Cluster Merging:** Refines cluster granularity based on hierarchical relationships.\n* **Visualization:** Uses `scanpy` for spatial plotting and `matplotlib` for combining plot figures.\n* **Evaluation:** Computes Adjusted Rand Index to evaluate the quality of clustering results with respect to known ground truth annotations.\n\n**In Summary:**\n\nThis script provides a comprehensive workflow for spatial transcriptomics analysis. It performs data loading, quality control, dimensionality reduction, multiple clustering approaches using both Python and R packages, and finally calculates and compares the cluster results with the ground truth labels using ARI metric. It uses `omicverse` for spatial analysis methods and `scanpy` for spatial data handling and visualization. The script is well-organized and modular, employing loops for repeated tasks and using dictionaries to manage method parameters.",
3+
"file": "t_cluster_space_annotated.py"
4+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"description": "This Python script performs a comprehensive spatial transcriptomics analysis using the `omicverse` library along with `scanpy`, `pandas`, `sklearn`, and `matplotlib`. It integrates spatial data, gene expression, and intercellular communication analysis to identify key cell-cell interactions and biological flows. Here's a breakdown of the functionality and structure:\n\n**Overall Workflow:**\n\n1. **Data Loading and Preprocessing:**\n * Loads Visium spatial data.\n * Performs basic quality control (QC) filtering on genes.\n * Selects spatially variable genes (SVGs).\n2. **Intercellular Communication Analysis:**\n * Loads a ligand-receptor database (CellChat).\n * Filters the database based on genes present in the dataset.\n * Performs spatial communication analysis using CellChat.\n * Analyzes communication direction for a specified pathway (e.g., FGF).\n * Visualizes cell communication patterns.\n3. **Integration of Annotation Data:**\n * Loads a ground truth annotation file.\n * Adds annotation data to the AnnData object.\n * Visualizes spatial data with annotations.\n4. **Biological Flow Analysis:**\n * Constructs gene expression modules (GEMs) using non-negative matrix factorization (NMF).\n * Extracts top genes for a specific GEM.\n * Constructs cellular flows from communication data.\n * Determines informative variables in the flow data.\n * Performs KMeans clustering on spatial coordinates.\n * Learns intercellular flows, validates them, and filters low-confidence edges.\n * Constructs the intercellular flow network.\n5. **Visualization and Output:**\n * Visualizes the GEM expression by cell type.\n * Visualizes the intercellular flow network.\n * Saves intermediate and final results to files.\n\n**Line-by-Line Explanation:**\n\n* **Lines 1-3:** Imports the required libraries: `omicverse` (as `ov`) and `scanpy` (as `sc`).\n* **Line 5:** Sets the plotting style using `omicverse`.\n* **Line 7:** Loads Visium spatial data into an `AnnData` object named `adata` using `scanpy.read_visium`.\n* **Line 8:** Ensures variable names (gene names) are unique in the `adata` object using `adata.var_names_make_unique()`.\n* **Line 10:** Calculates QC metrics for each cell in place in the `adata` object using `scanpy.pp.calculate_qc_metrics`.\n* **Line 11:** Filters the genes (variables) in `adata`, retaining only those with total counts greater than 100.\n* **Line 12:** Performs spatial variable gene selection using `omicverse.space.svg`, saving the result to `adata`.\n* **Line 13:** Displays the `adata` object.\n* **Line 15:** Writes the `adata` object to a compressed `.h5ad` file.\n* **Lines 19-21:** Loads a human ligand-receptor database using `omicverse.externel.commot.pp.ligand_receptor_database` and prints its shape.\n* **Lines 23-26:** Filters the ligand-receptor database to keep only interactions involving genes present in `adata` and prints the shape of the filtered dataframe.\n* **Lines 28-35:** Performs spatial communication analysis using `omicverse.externel.commot.tl.spatial_communication`, storing the result in `adata`.\n* **Lines 36-37:** Imports the `pandas` library (as `pd`) and the `os` library.\n* **Line 38:** Loads a ground truth annotation file into a pandas `DataFrame`, setting the index and column names.\n* **Line 39:** Assigns a column name 'Ground_Truth' to the annotation DataFrame.\n* **Line 40:** Adds the ground truth annotations to the `adata.obs` DataFrame, matching on cell IDs.\n* **Line 41:** Defines a list of colors to be used for plotting.\n* **Line 43:** Generates a spatial plot of the data colored by ground truth annotations using `scanpy.pl.spatial`.\n* **Line 45:** Creates a dictionary mapping ground truth categories to their corresponding colors.\n* **Line 47:** Prints the head of the ligand-receptor information stored within the `adata` object.\n* **Line 49:** Imports the `matplotlib.pyplot` library as `plt`.\n* **Lines 50-52:** Sets parameters for the spatial communication analysis (scale, neighborhood size, target pathway).\n* **Line 53:** Performs communication direction analysis for the specified pathway.\n* **Lines 54-62:** Visualizes cell communication for the specified pathway.\n* **Line 63:** Sets the title of the communication visualization plot.\n* **Line 67:** Writes the updated `adata` object to a compressed `.h5ad` file.\n* **Line 69:** Reads the h5ad file back into the `adata` object.\n* **Line 70:** Displays the `adata` object.\n* **Line 72:** Creates a new layer named 'normalized' in the AnnData object by copying the data from `adata.X`.\n* **Lines 74-79:** Constructs gene expression modules using NMF.\n* **Line 80:** Sets the target gene expression module for further analysis.\n* **Lines 81-87:** Extracts the top genes from the selected GEM module using `omicverse.externel.flowsig.ul.get_top_gem_genes`.\n* **Line 88:** Displays the top genes for the selected GEM module.\n* **Line 90:** Defines a commot output key, which is the commot-cellchat output.\n* **Lines 91-98:** Constructs cellular flows from commot output.\n* **Lines 99-108:** Determines informative variables in the flow data.\n* **Line 109:** Imports the `KMeans` class from sklearn.\n* **Line 111:** Performs KMeans clustering on spatial coordinates.\n* **Line 112:** Adds the spatial KMeans clustering labels to the adata.obs.\n* **Lines 115-121:** Learns intercellular flows using `ov.externel.flowsig.tl.learn_intercellular_flows`.\n* **Lines 123-128:** Applies biological flow validation using `ov.externel.flowsig.tl.apply_biological_flow`.\n* **Line 129:** Sets the threshold to filter low-confidence edges in the network.\n* **Lines 131-136:** Filters low-confidence edges using `ov.externel.flowsig.tl.filter_low_confidence_edges`.\n* **Line 137:** Writes the `adata` object to a compressed h5ad file.\n* **Line 141:** Constructs the intercellular flow network from the adata object.\n* **Line 144:** Sets the flowsig expression key.\n* **Line 145:** Retrieves the expression data associated with the flow key.\n* **Line 146:** Creates a new AnnData object from the expression data.\n* **Line 147:** Assigns the observations from adata to adata\\_subset.\n* **Line 148:** Rename variable names using a GEM naming convention.\n* **Line 151:** Imports the matplotlib plotting library.\n* **Line 152-153:** Creates a dotplot of GEM expression by ground truth, with specified parameters.\n* **Line 154:** Creates a dictionary mapping ground truth categories to colors for the dotplot.\n* **Line 156:** Plots the flowsig network.\n\n**Key Libraries:**\n\n* **`omicverse`:** A library for multi-omics data analysis, including spatial omics, with functionality for spatial gene selection, intercellular communication analysis, and biological flow analysis.\n* **`scanpy`:** A popular library for single-cell RNA-seq analysis, used here for loading, preprocessing, and visualizing spatial transcriptomics data.\n* **`pandas`:** Used for data manipulation, primarily loading the ground truth annotation file.\n* **`sklearn`:** Used for KMeans clustering.\n* **`matplotlib`:** Used for general purpose plotting.\n\n**In summary,** this script performs a detailed analysis of spatial transcriptomics data, combining gene expression, spatial information, intercellular communication, and biological flow to identify meaningful patterns and relationships within the tissue. It utilizes several libraries for this purpose, demonstrating a common approach to these kinds of analyses.",
3+
"file": "t_commot_flowsig_annotated.py"
4+
}

0 commit comments

Comments
 (0)