sc-manifold-alignment

Code for recreating results from "A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments." Given multiple single cell RNA seq datasets with some shared genes, sstGPVLM fits a joint latent space that can be used for downstream analysis.

Data Processing

processing contains jupter notebooks for converting the orignal data to hdf5 files inputed to scripts. Simulated data can be generated in the analysis notebook provided. Original single cell data is available at:

Pancrease data: https://github.com/MarioniLab/MNN2017. Follow the processing steps pre-alignment in the provided by the R files first.
Gilad data: https://github.com/jdblischak/singlecell-qtl
seqFISH+ data: https://github.com/CaiGroup/seqFISH-PLUS

Fitting

alignment-scripts contains python scripts for fitting the model to data. It also contains a python script for calculating the average Wasserstein-based distance of a fit from the true latent space.

Requirements

sstGPLVM is implemented in python 2.7 with:

numpy 1.14.5
pandas 0.23.3
h5py 2.8.0
tensorflow 1.6.0
edwards 1.3.5
sklearn 0.19.2

Running

Input:

A numpy array or sparse csr/csc matrix of scRNA counts (or other types data) with format N cells (samples) as rows by p genes (features) as columns (loaded to y_train). Input this directly into the code as y_train.
A numpy array of relevant metadata with format N cells as rows by m metadata fatures (loaded to z_init). It is also possible to structure the metadata with some missing cells that can be imputed (see alignment-seqfish for an example).

Options: The following parameters can be adjusted in the script to adjust inference:

Degrees of freedom (--df) - default: 4
Use t-Distribution error model (otherwise normal error) (--T) - default: True
Initial Number of Dimensions (--Q) - default: 3
Kernel Function
- Matern 1/2, 3/2, 5/2 (--m12, --m32, --m52) - default: False
- Periodic (--per_bool) - default: False
Number of Inducing Points (--m) - default: 30
Batch size (--M) - default: 250
Max iterations (--iterations) - default: 5000
Save frequency (--save_freq): - default: 250
Sparse data type (is CSC or CSR) (--sparse): - default: False
PCA Initialization (otherwise random initialization) (--pca_init): - default: True
Output directory (--out): - default: ./test

Output: hdf5 file with

Latent mapping posterior (mean and variance)
Gene-specific noise
Kernel hyperparameters (variance, lengthscale)
Inducing points in latent and high-dimensional space
The final metadata (Z) variables

Analysis

analysis-nbs contains jupyter notebooks and the required output files for recreating figures from the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
alignment-scripts		alignment-scripts
analysis-nbs		analysis-nbs
processing		processing
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sc-manifold-alignment

Data Processing

Fitting

Requirements

Running

Analysis

About

Releases

Packages

Languages

License

architverma1/sc-manifold-alignment

Folders and files

Latest commit

History

Repository files navigation

sc-manifold-alignment

Data Processing

Fitting

Requirements

Running

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages