- Data file: a
.csv
file with genes as rows and cells/spots as column
Cell1 | Cell2 | Cell3 | ... | CellN | |
---|---|---|---|---|---|
Gene1 | 1 | 2 | 1 | ... | 0 |
Gene2 | 4 | 1 | 0 | ... | 4 |
... | ... | ... | ... | ... | ... |
GeneN | 0 | 0 | 2 | ... | 0 |
- Meta file: a
.csv
file with cell/spot ID and celltype/domain annotation columns- The column containing cell ID should be named
Cell
- the column containing the labels should be named
Cell_type
- The column containing cell ID should be named
Cell | Cell_type | |
---|---|---|
Cell1 | Cell1 | T cell |
Cell2 | Cell2 | B cell |
... | ... | ... |
CellN | CellN | Monocyte |
import scCube
from scCube import scCube
from scCube.visualization import *
from scCube.utils import *
model = scCube()
sc_adata = model.pre_process(
sc_data=sc_data,
sc_meta=sc_meta,
is_normalized=False
)
Parameters
sc_data: DataFrame
DataFrame of input data
sc_meta: DataFrame
DataFrame of input meta
is_normalized: bool, default: False
Whether the input data is normalized or not. If is_normalized=False
, the input data will be normalized by scCube first.
generate_sc_meta, generate_sc_data = model.train_vae_and_generate_cell(
sc_adata=sc_adata,
celltype_key='Cell_type',
cell_key='Cell',
target_num=None,
batch_size=512,
epoch_num=10000,
lr=0.0001,
hidden_size=128,
save_model=True,
save_path=save_path,
project_name=model_name,
used_device='cuda:0'
)
Parameters
sc_adata: AnnData
AnnData of pre-processed data
celltype_key: str
The column name of cell labels
in meta
cell_key: str
The column name of cell
in meta
target_num: Optional[dict], default: None
Target number of cells to generate, if target_num=None
, generate cells by the proportion of cell types of the input data.
batch_size: int, default: 512
Batch size of training
epoch_num: int, default: 10000
Epoch number of training
lr: float, default: 0.0001
Learning reta of training
hidden_size: int, default: 128
Hidden size of VAE model
save_model: bool, default: True
Whether save trained VAE model or not
save_path: str
The save path
project_name: str
The name of trained VAE model
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate_sc_meta, generate_sc_data = model.load_vae_and_generate_cell(
sc_adata=sc_adata,
celltype_key='Cell_type',
cell_key='Cell',
target_num=None,
hidden_size=128,
load_path=load_path,
used_device='cuda:0'
)
Parameters
sc_adata: AnnData
AnnData of pre-processed data
celltype_key: str
The column name of cell labels
in meta
cell_key: str
The column name of cell
in meta
target_num: Optional[dict], default: None
Target number of cells to generate, if target_num=None
, generate cells by the proportion of cell types of the input data.
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate_sc_data, generate_sc_meta = model.generate_pattern_random(
generate_sc_data=generate_sc_data,
generate_sc_meta=generate_sc_meta,
celltype_key='Cell_type',
set_seed=False,
seed=12345,
spatial_cell_type=None,
spatial_dim=2,
spatial_size=30,
delta=25,
lamda=0.75,
is_split=True,
split_coord='point_z',
slice_num=5,)
Parameters
generate_sc_data: DataFrame
DataFrame of generated data
generate_sc_meta: DataFrame
DataFrame of generated meta
celltype_key: str
The column name of cell labels
in meta
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_cell_type: _ Optional[list], default: None
_
The selected cell types with spatial patterns, ifspatial_cell_type=None
, all cell types would be assigned spatial patterns
spatial_dim: int, default: 2
The spatial dimensionality, 2
or 3
spatial_size: int, default: 30
The scope for simulated spatial patterns, the large values will take more running time
delta: float, default: 25
The larger value will tend to form spatial patterns with greater connectivity
lamda: float, default: 0.75
The larger values will tend to form clearer spatial patterns
is_split: bool, default: True
Whether to spilt the 3D generated spatial patterns into a series of 2D spatial patterns, only works when spatial_dim=3
split_coord: str, default: point_z
The name of split coordinate axis, only works when spatial_dim=3
and is_split=True
slice_num: int, default: 5
The targeted number of 2D slices, only works when spatial_dim=3
and is_split=True
generate_sc_data_sub, generate_sc_meta_sub = model.generate_subtype_pattern_random(
generate_sc_data=generate_sc_data,
generate_sc_meta=generate_sc_meta,
celltype_key='Cell_type',
select_cell_type='',
subtype_key='',
set_seed=False,
seed=12345,
spatial_dim=2,
subtype_delta=25,)
Parameters
generate_sc_data: DataFrame
DataFrame of generated data
generate_sc_meta: DataFrame
DataFrame of generated meta
celltype_key: str
The column name of cell labels
in meta
select_cell_type: str
the select cell types to generate subtype spatial patterns
subtype_key: str
The column name of cell sub-labels
in meta
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_dim: int, default: 2
The spatial dimensionality, 2
or 3
subtype_delta: int, default: 25
The larger value will tend to form spatial patterns with greater connectivity
generate spot-based SRT data with reference-free strategy by combined simulated gene expression profiles and spatial patterns
st_data, st_meta, st_index = model.generate_spot_data_random(
generate_sc_data=generate_sc_data,
generate_sc_meta=generate_sc_meta,
platform='ST',
gene_type='whole',
min_cell=10,
n_gene=None,
n_cell=10,)
Parameters
generate_sc_data: DataFrame
DataFrame of generated data
generate_sc_meta: DataFrame
DataFrame of generated meta
platform: str, default: ST
Spot arrangement, ST
-- square neighborhood structure; Visium
-- hexagonal neighborhood structure; Slide
-- random neighborhood structure
gene_type: str, default: whole
The type of genes to generate, whole
-- the whole genes; hvg
-- the highly variable genes; marker
-- the marker genes of each cell type; random
-- the randomly selected genes
min_cell: int, default: 10
Filter the genes expressed in fewer than min_cell
cells before selected genes, only works when gene_type='random', 'hvg', or 'marker'
n_gene: Optional[int], default: None
The number of genes to select, only works when gene_type='random', 'hvg', or 'marker'
n_cell: int, default: 10
The average number of cells per spot, only works when is_spot=True
generate image-based SRT data with reference-free strategy by combined simulated gene expression profiles and spatial patterns
st_data, st_meta, st_index = model.generate_spot_data_random(
generate_sc_data=generate_sc_data,
generate_sc_meta=generate_sc_meta,
gene_type='whole',
min_cell=10,
n_gene=None,)
Parameters
generate_sc_data: DataFrame
DataFrame of generated data
generate_sc_meta: DataFrame
DataFrame of generated meta
gene_type: str, default: whole
The type of genes to generate, whole
-- the whole genes; hvg
-- the highly variable genes; marker
-- the marker genes of each cell type; random
-- the randomly selected genes
min_cell: int, default: 10
Filter the genes expressed in fewer than min_cell
cells before selected genes, only works when gene_type='random', 'hvg', or 'marker'
n_gene: Optional[int], default: None
The number of genes to select, only works when gene_type='random', 'hvg', or 'marker'
generate_sc_data, generate_sc_meta = model.generate_pattern_custom_mixing(
sc_adata=sc_adata,
generate_cell_num=5000,
celltype_key='Cell_type',
cell_key='Cell',
set_seed=False,
seed=12345,
spatial_size=30,
select_celltype=None,
prop_list=None,
hidden_size=128,
load_path='',
used_device=cuda:0,)
Parameters
sc_adata: AnnData
AnnData of reference data
generate_cell_num: int
cell number to generate
celltype_key: str
The column name of cell labels
in sc_adata.obs
cell_key: str
The column name of cell
in sc_adata.obs
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_size: int, default: 30
The scope for simulated spatial patterns
select_celltype: Optional[list], default: None
The selected cell types for simulation, ifselect_celltype=None
, all cell types would be selected
prop_list: Optional[list], default: None
The proportion of selected cell types
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate customized spatial patterns for cell types with reference-free strategy (clustered patterns)
generate_sc_data, generate_sc_meta = model.generate_pattern_custom_cluster(
sc_adata=sc_adata,
generate_cell_num=5000,
celltype_key='Cell_type',
cell_key='Cell',
set_seed=False,
seed=12345,
spatial_size=30,
select_celltype=None,
shape_list=['Circle', 'Oval'],
cluster_celltype_list=[],
cluster_purity_list=[],
infiltration_celltype_list=[[]],
infiltration_prop_list=[[]],
background_celltype=[],
background_prop=None,
center_x_list=[20, 10],
center_y_list=[20, 10],
a_list=[15, 20],
b_list=[10, 15],
theta_list=[np.pi / 4, np.pi / 4],
scale_value_list=[4.8, 4.8],
twist_value_list=[0.5, 0.5],
hidden_size=128,
load_path='',
used_device='cuda:0')
Parameters
sc_adata: AnnData
AnnData of reference data
generate_cell_num: int
cell number to generate
celltype_key: str
The column name of cell labels
in sc_adata.obs
cell_key: str
The column name of cell
in sc_adata.obs
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_size: int, default: 30
The scope for simulated spatial patterns
select_celltype: Optional[list], default: None
The selected cell types for simulation, ifselect_celltype=None
, all cell types would be selected
shape_list: list
The shapes for simulation, Circle
, Oval
, or Irregular
cluster_celltype_list: list
The selected cell types for clustered shapes, the length must be equal to shape_list
cluster_purity_list: list
The purity of each clustered shape
infiltration_celltype_list: list
The infiltrating cell types in each clustered shape
infiltration_prop_list: list
The proportion of each infiltrating cell type in each clustered shape
background_celltype: list
The cell types considered as background
background_prop: Optional[list], default: None
The proportion of cell types considered as background, ifbackground_prop=None
, each background cell type follows an equal proportion
center_x_list: list
The position of the center of each clustered shapes on the X-axis
center_y_list: list
The position of the center of each clustered shapes on the Y-axis
a_list: list
The major axis of each clustered shapes
b_list: list
The minor axis of each clustered shapes
theta_list: list
The direction of each clustered shapes
scale_value_list: list
The scale factor of each clustered shapes used to control the shape of each cluster, only works when shape is irregualr
twist_value_list: list
The twist degree of each clustered shapes used to control the shape of each cluster, only works when shape is irregualr
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate customized spatial patterns for cell types with reference-free strategy (cell rings patterns)
generate_sc_data, generate_sc_meta = model.generate_pattern_custom_ring(
sc_adata=sc_adata,
generate_cell_num=5000,
celltype_key='Cell_type',
cell_key='Cell',
set_seed=False,
seed=12345,
spatial_size=30,
select_celltype=None,
shape_list=['Circle', 'Oval'],
ring_celltype_list=[],
ring_purity_list=[],
infiltration_celltype_list=[[]],
infiltration_prop_list=[[]],
background_celltype=[],
background_prop=None,
center_x_list=[20, 10],
center_y_list=[20, 10],
a_list=[15, 20],
b_list=[10, 15],
theta_list=[np.pi / 4, np.pi / 4],
ring_width_list=[[2, 3], [2]],
hidden_size=128,
load_path='',
used_device='cuda:0')
Parameters
sc_adata: AnnData
AnnData of reference data
generate_cell_num: int
cell number to generate
celltype_key: str
The column name of cell labels
in sc_adata.obs
cell_key: str
The column name of cell
in sc_adata.obs
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_size: int, default: 30
The scope for simulated spatial patterns
select_celltype: Optional[list], default: None
The selected cell types for simulation, ifselect_celltype=None
, all cell types would be selected
shape_list: list
The shapes for simulation, Circle
, Oval
, or Irregular
ring_celltype_list: list
The selected cell types for cell rings shapes, the length must be equal to shape_list
ring_purity_list: list
The purity of each cell rings shape
infiltration_celltype_list: list
The infiltrating cell types in each cell rings shape
infiltration_prop_list: list
The proportion of each infiltrating cell type in each cell rings shape
background_celltype: list
The cell types considered as background
background_prop: Optional[list], default: None
The proportion of cell types considered as background, ifbackground_prop=None
, each background cell type follows an equal proportion
center_x_list: list
The position of the center of each cell rings shapes on the X-axis
center_y_list: list
The position of the center of each cell rings shapes on the Y-axis
a_list: list
The major axis of each cell rings shapes
b_list: list
The minor axis of each cell rings shapes
theta_list: list
The direction of each cell rings shapes
ring_width_list: list
The width of each cell rings shape
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate_sc_data, generate_sc_meta = model.generate_pattern_custom_stripes(
sc_adata=sc_adata,
generate_cell_num=5000,
celltype_key='Cell_type',
cell_key='Cell',
set_seed=False,
seed=12345,
spatial_size=30,
select_celltype=None,
y1_list=[None, None],
y2_list=[None, None],
stripe_width_list=[2, 3],
stripe_purity_list=[],
infiltration_celltype_list=[[]],
infiltration_prop_list=[[]],
background_celltype=[],
background_prop=None,
hidden_size=128,
load_path='',
used_device='cuda:0')
Parameters
sc_adata: AnnData
AnnData of reference data
generate_cell_num: int
cell number to generate
celltype_key: str
The column name of cell labels
in sc_adata.obs
cell_key: str
The column name of cell
in sc_adata.obs
set_seed: bool, default: False
Whether to set seed for reproducible simulation
seed: int, default: 12345
The seed number
spatial_size: int, default: 30
The scope for simulated spatial patterns
select_celltype: Optional[list], default: None
The selected cell types for simulation, ifselect_celltype=None
, all cell types would be selected
y1_list: list
The endpoints of simulated stripes, if y1_list=[None]
, scCube would choose endpoints randomly in the scope of spatial patterns
y2_list: list
The endpoints of simulated stripes, if y2_list=[None]
, scCube would choose endpoints randomly in the scope of spatial patterns
stripe_celltype_list: list
The selected cell types for stripe shapes, the length must be equal to y1_list
or y2_list
stripe_width_list: list
The width for each stripe shape, the length must be equal to stripe_celltype_list
stripe_purity_list: list
The purity of each stripe shape
infiltration_celltype_list: list
The infiltrating cell types in each stripe shape
infiltration_prop_list: list
The proportion of each infiltrating cell type in each stripe shape
background_celltype: list
The cell types considered as background
background_prop: Optional[list], default: None
The proportion of cell types considered as background, ifbackground_prop=None
, each background cell type follows an equal proportion
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate_sc_data, generate_sc_meta = model.generate_pattern_custom_complex(
sc_adata=sc_adata,
spa_pattern_base,
spa_pattern_add,
celltype_key='Cell_type',
cell_key='Cell',
background_celltype=[],
hidden_size=128,
load_path='',
used_device='cuda:0')
Parameters
sc_adata: AnnData
AnnData of reference data
spa_pattern_base: DataFrame
The base spatial patterns (for example, mixing cell populations, cell clusters, or cell rings)
spa_pattern_add: DataFrame
The added spatial patterns overlaid on the base spatial patterns (for example, stripes)
celltype_key: str
The column name of cell labels
in sc_adata.obs
cell_key: str
The column name of cell
in sc_adata.obs
background_celltype: list
The cell types considered as background
background_prop: Optional[list], default: None
The proportion of cell types considered as background, the background cell type must be same in the base and added spatial patterns
hidden_size: int, default: 128
Hidden size of VAE model
load_path: str
The load path
used_device: str, default: cuda:0
Device name, cpu
or cuda
generate_sc_data, generate_sc_meta = model.generate_pattern_reference(
sc_adata=sc_adata,
generate_sc_data=generate_sc_data,
generate_sc_meta=generate_sc_meta,
celltype_key='Cell_type',
spatial_key=['x', 'y'],
cost_metric='sqeuclidean'
)
Parameters
sc_adata: AnnData
AnnData of reference data
generate_sc_data: DataFrame
DataFrame of generated data
generate_sc_meta: DataFrame
DataFrame of generated meta
celltype_key: str
The column name of cell labels
in meta
spatial_key: list
The column name of spatial coordinates
in meta
cost_metric: str, defalut: sqeuclidean
The cost distance between generate_sc_data and real_data, sqeuclidean
by default. On numpy the function also accepts from the scipy.spatial.distance.cdist function : ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
p = plot_spatial_pattern_scatter(
obj=generate_sc_meta,
figwidth=8,
figheight=8,
dim=2,
x="point_x",
y="point_y",
z=None,
label=None,
palette=None,
colormap='rainbow',
size=10,
alpha=1,
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of generated meta
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
dim: int, defalut: 2
Spatial dimensionality
x: str, defalut: point_x
The name of column containing x coordinate
y: str, defalut: point_y
The name of column containing y coordinate
z: Optional[str], default: None
The name of column containing z coordinate, only use when 'dim = 3'
label: Optional[str], default: None
The name of column containing cell type information, if 'label=None', plot coordinates without cell type information only.
palette: Optional[list], default: None
List of colors used, if 'palette=None', plot scatter plot with colormap colors
colormap: str, default: rainbow
_
The name of cmap
size: float, default: 10
The size of point
alpha: float, default: 1
The transparency of point
p = plot_spatial_pattern_density(
obj=generate_sc_meta,
figwidth=8,
figheight=8,
x="point_x",
y="point_y",
label="Cell_type",
show_celltype=None,
colormap='Blues',
fill=True,
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of generated meta
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
x: str, defalut: point_x
The name of column containing x coordinate
y: str, defalut: point_y
The name of column containing y coordinate
label: str, default: Cell_type
The name of column containing cell type information, if 'label=None', plot coordinates without cell type information only.
show_celltype: Optional[str], default: None
The cell type selected to plot separately, if 'show_celltype=None', plot all cell type together
colormap: str, default: Blues
_
The name of cmap
fill: bool, default: True
If 'fill=True', fill in the area between bivariate contours
p = plot_spot_scatterpie(
obj=prop,
figwidth=8,
figheight=8,
x="spot_x",
y="spot_y",
palette=None,
colormap='rainbow',
res=50,
direction="+",
start=0.0,
size=100,
edgecolor="none",
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of cell type proportion per spot
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
x: str, defalut: spot_x
The name of column containing x coordinate
y: str, defalut: spot_y
The name of column containing y coordinate
palette: Optional[dict], default: None
Dict of color of each cell type, if 'palette == None', plot scatterpie plot with colormap colors
colormap: str, default: rainbow
_
The name of cmap
res: int, default: 50
Number of points around the circle
direction: str, default: +
'+' for counter-clockwise, or '-' for clockwise
start: flost, default: 0.0
Starting position in radians
size: float, default: 100
The size of point
edgecolor: str, default: none
The edge color of point
p = plot_spot_prop(
obj=prop,
figwidth=8,
figheight=8,
x="spot_x",
y="spot_y",
colormap='viridis',
show_celltype= "",
size=100,
alpha=1,
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of cell type proportion per spot
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
x: str, defalut: spot_x
The name of column containing x coordinate
y: str, defalut: spot_y
The name of column containing y coordinate
colormap: str, default: viridis
_
The name of cmap
show_celltype: Union[list, str]
The cell type selected to plot
size: float, default: 100
The size of point
alpha: float, default: 1
The transparency of point
p = plot_gene_scatter(
data=generate_sc_data,
obj=generate_sc_meta_new,
figwidth=8,
figheight=8,
dim=2,
label='Cell',
normalize=True,
x="point_x",
y="point_y",
z="point_z",
colormap='viridis',
show_gene: str = "",
size=10,
alpha=1,
)
plt.show(p)
Parameters
data: DataFrame
DataFrame of generate data
obj: DataFrame
DataFrame of generate meta
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
dim: int, default: 2
Spatial dimensionality
label: str, default: Cell
The name of column containing cell/spot name
normalize: bool, default: True
If 'normalize=True', normalizing expression value to [0, 1]
x: str, defalut: point_x
The name of column containing x coordinate
y: str, defalut: point_x
The name of column containing y coordinate
z: Optional[str], default: None
The name of column containing z coordinate, only use when 'dim = 3'
colormap: str, default: viridis
_
The name of cmap
show_gene: str
The gene selected to plot
size: float, default: 10
The size of point
alpha: float, default: 1
The transparency of point
p = plot_gene_scatter(
obj=st_index,
figwidth=8,
figheight=8,
label='spot',
n_bin=20
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of cell-spot index
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
label: str, default: spot
The name of column containing spot name
n_bins: int, default: 20
_
The number of equal-width bins in the range
p = plot_slice_scatter(
obj=generate_sc_meta,
figwidth=8,
figheight=8,
x="point_x",
y="point_y",
label='Cell_type',
palette=None,
colormap='rainbow',
size=10,
alpha=1
)
plt.show(p)
Parameters
obj: DataFrame
DataFrame of generated meta
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
x: str, defalut: point_x
The name of column containing x coordinate
y: str, defalut: point_y
The name of column containing y coordinate
label: str, default: Cell_type
The name of column containing cell type information
palette: Optional[list], default: None
List of colors used, if 'palette=None', plot scatter plot with colormap colors
colormap: str, default: rainbow
_
The name of cmap
size: float, default: 10
The size of point
alpha: float, default: 1
The transparency of point
p = plot_slice_gene_scatter(
data=generate_sc_data,
obj=generate_sc_meta,
figwidth=8,
figheight=8,
x="point_x",
y="point_y",
label='Cell_type',
normalize=True,
show_gene="",
colormap='viridis',
size=10,
alpha=1
)
plt.show(p)
Parameters
data: DataFrame
DataFrame of generated data
obj: DataFrame
DataFrame of generated meta
figwidth: float, default: 8
Figure width
figheight: float, default: 8
Figure height
x: str, defalut: point_x
The name of column containing x coordinate
y: str, defalut: point_y
The name of column containing y coordinate
label: str, default: Cell_type
The name of column containing cell type information
normalize: bool, default: True
If 'normalize=True', normalizing expression value to [0, 1]
show_gene: str
The gene selected to plot
colormap: str, default: viridis
_
The name of cmap
size: float, default: 10
The size of point
alpha: float, default: 1
The transparency of point