Tangram aligns single-cell and spatial data by comparing gene expression of shared genes via the cosine similarity for single-cell to spatial mapping in its default setting. The simplicity of the model allows the incorporation of other terms to add, e.g., prior knowledge.
We refined Tangram including (1) optimizing gene set selection, (2) employing regularization techniques to balance consistency and certainty, (3) incorporating spatial information using, e.g., neighborhood-based indicators, and (4) testing strategies for improved cell subset selection.
Evaluations on real and simulated mouse datasets demonstrated that this approach improves both gene expression prediction and cell(type) mapping.
Set up conda environment using the environment.yml file
conda env create -f environment.yml
conda activate tangramx-env
Install the package
pip install .
To start using Tangram with our refinements, gene selection, or our benchmarking framework, import the code in your jupyter notebooks or/and scripts
import refined_tangram as tg
import gene_selection
import cell_selection
import benchmarking
The refinements build on Tangram’s code, keeping its usage unchanged while providing additional options through new hyperparameters and functions.
Load your spatial data and your single cell data, and pre-process them using Tangram's function tg.pp_adatas.
To select a specifc gene set, you can use the following functions beforehand:
gene_selection.ctg(adata_sc)for cell type specific genes,gene_selection.hvg(adata_sc)for highly variable genes of the single-cell dataset,gene_selection.spapros(adata_sc)for the probe set selected via Spapros, orgene_selection.svg(adata_sp)for spatially variable genes computed via SpatialDE2.
adata_sp = sc.read_h5ad(<path>)
adata_sc = sc.read_h5ad(<path>)
genes = gene_selection.ctg(adata_sc)
tg.pp_adatas(adata_sc, adata_sp, genes=genes)
Once the datasets are pre-processed we can map the single cells onto the spots via Tangram's function tg.map_cells_to_space.
Several regularization strategies are available via the hyperparameters lambda_r, lambda_l1, and lambda_l2.
Spatial information can be integrated in the form of spatial weight matrices that capture the locality for each spot. We added three extensions to the loss function based on that:
- Spatially weighted gene expression comparison with the hyperparameter
lambda_neighborhood_g1, - Preservation of local spatial indicators with the hyperparameters
lambda_getis_ordfor the local Getis-Ord$G^*$ statistic,lambda_gearyfor the local Geary's$C$ statistic, andlambda_moranfor the local Moran's$I$ statistic, and - Enforcement of cell type islands with the hyperparameter
lambda_ct_islands.
adata_map = tg.map_cells_to_space(adata_sc, adata_sp, lambda_r = 2.95e-09, lambda_l2 = 1e-18,
lambda_neighborhood_g1 = 0.99, lambda_getis_ord = 0.71,
lambda_ct_islands = 0.17)
The returned adata_map is a cell-by-spot structure where adata_map.X[i, j] gives the probability for cell i to be in voxel j.
These probabilities can be used to derive cell type mapping probabilities with our function tg.cell_type_mapping(adata_map).
Depending on the specific task and dataset, mapping only a subset of the cells may be beneficial.
tg.map_cells_to_space offers the hyperparameter mode="constrained" that allows the model to learn an optimal cell subset during training.
To enable a cell sampling adapted from CytoSPACE, we added the function cell_selection.cell_sampling(adata_sc, adata_sp) that returns a modified adata_sc which can be used with the default mode="cells".
Since real dataset pairs lack ground truth for cell mapping, we generated low-resolution datasets using data from spatial technologies with single-cell resolution.
This process involves aggregating nearby cells into pseudo-spots based on a spatial grid with the function benchmarking.generate_adata_st, followed by assigning each cell to its nearest spot with the function benchmarking.cells2spots and finally generating the true mapping object with benchmarking.true_mapping.
xgrid, ygrid = np.meshgrid(np.linspace(0, 1, 25),
np.linspace(0, 1, 25))
gen_adata_sp = benchmarking.generate_adata_st(adata_sc, xgrid, ygrid, cell_cover=0.8, min_cell_count=3)
benchmarking.cells2spots(adata_sc, gen_adata_sp)
true_adata_map = benchmarking.true_mapping(adata_sc,adata_sp)
To evaluate correctness, consistency, agreement, and certainty of gene expression prediction, cell, and cell type mapping across multiple runs, you can store the resulting mapping objects in a nested dictionary adata_maps_pred.
Each model should have a unique label as the first key, containing another dictionary where the run number serves as the key.
metrics = benchmarking.eval_metrics(adata_maps_pred, adata_sc, adata_st, true_adata_map)
The measurements have the same nested dictonary structure.
To get a mean value for each model and metric, you can run benchmarking.mean_metrics(metrics).
We extended Tangram's framework by enabling hyperparameter tuning using Ray.
It can be installed via pip install ray.
You can use Optuna’s search algorithm to optimize for the correctness, consistency, and / or certainty of the gene expression prediction and / or cell mapping.
metric = ["cell_map_consistency","cell_map_agreement","cell_map_certainty",
"gene_expr_consistency","gene_expr_correctness"]
config = {
"learning_rate" : tune.loguniform(0.001, 1),
"lambda_g1": tune.uniform(0, 1.0),
"lambda_r": tune.loguniform(1e-20, 1e-3),
"lambda_l2": tune.loguniform(1e-20, 1e-3),
"lambda_neighborhood_g1": tune.uniform(0, 1.0),
"lambda_ct_islands": tune.uniform(0, 1.0),
"lambda_getis_ord": tune.uniform(0, 1.0),
}
tuner = tg.map_cells_to_space_hyperparameter_tuning(adata_sc, adata_sp, metric, config)
tuner.get_results()

