Maxspin (maximization of spatial information) is an information theoretic approach to quantifying the degree of spatial organization in spatial transcriptomics (or other spatial omics) data.
Our paper describing and benchmarking this method is out now in Cell Reports Methods:
Jones, D.C., Danaher, P., Kim, Y., Beechem, J.M., Gottardo, R. and Newell, E.W. (2023) An information theoretic approach to detecting spatially varying genes. Cell Reports Methods.
The python package can be installed with:
pip install maxspin
This package operates on AnnData
objects from the anndata package.
We assume the existence of a spatial neighborhood graph. A simple and effective way of doing this is Delaunay triangulation, for example using squidpy.
import squidpy as sq
sq.gr.spatial_neighbors(adata, delaunay=True, coord_type="generic")
Spatial information can then be measured using the spatial_information
function.
from maxspin import spatial_information
spatial_information(adata, prior=None)
This adds a spatial_information
column to the var
metadata.
Similarly, pairwise spatial information can be computed with
pairwise_spatial_information
. This function will test every pair of genes,
which is pretty impractical for large numbers of genes, so it's a good idea to
subset the AnnData
object before calling this.
from maxspin import pairwise_spatial_information
pairwise_spatial_information(adata, prior=None)
For a more detailed example, check out the tutorial.
The method compute a score for every cell/spot that's in [0,1]
, like a
correlation but typically much smaller, and sums them to arrive at a spatial
information score that is then in [0, ncells]
. It's possible to normalize for
the number of cells by just dividing, but by default a pattern involving more
cells is considered more spatially coherent, hence the sum.
There are different ways spatial information can be computed. By default, no normalization is done and spatial information is computed on absolute counts. Uncertainty is incorporated using a Gamma-Poisson model.
If prior=None
is used, the method makes no attempt to account for estimation
uncertainty and computes spatial information directly on whatever is in
adata.X
.
The recommended way to run spatial_information
is with some kind of normalized
estimate of expression with some uncertainty estimation. There are two
recommended ways of doing this: SCVI and Vanity.
SCVI is a convenient and versatile probabilistic model of sequencing experiments, from which we can sample from the posterior to get normalized point estimates with uncertainty.
Using Maxspin with SCVI looks something like this:
import scvi
import numpy as np
from maxspin import spatial_information
scvi.model.SCVI.setup_anndata(adata)
model = scvi.model.SCVI(adata, n_latent=20)
# Sample log-expression values from the posterior.
posterior_samples = np.log(model.get_normalized_expression(return_numpy=True, return_mean=False, n_samples=20, library_size="latent"))
adata_scvi = adata.copy()
adata_scvi.X = np.mean(posterior_samples, axis=0)
adata_scvi.layers["std"] = np.std(posterior_samples, axis=0)
spatial_information(adata_scvi, prior="gaussian")
The tutorial has a more in depth example of using SCVI.
I developed the normalization method vanity in part as convenient way to normalize spatial transcriptomics data in a way that provides uncertainty estimates. The preferred way of running vanity + maxspin is then:
from maxspin import spatial_information
from vanity import normalize_vanity
normalize_vanity(adata)
spatial_information(adata, prior="gaussian")
Compared to SCVI, this model more aggressively shrinks low expression genes, which might cause it to miss something very subtle, but is less likely to detect spurious patterns.