This is a simple demo for jointly modeling single cell ATAC-seq and RNA-seq datasets using different methods, such as Signac from Seurat package, MultiVI from scVI framework and GLUE framework. Please note that It's for demonstrating working code/pipeline only.
The sample dataset is downloaded from the NCBI GEO database (GSE151302). After decompression, the files are stored in a working folder. The demo codes just process one sample dataset (GSM4572187) for now.
process/
├── GSM4572187_Control1_filtered_peak_bc_matrix.h5
├── GSM4572187_Control1_fragments.tsv.gz
├── GSM4572187_Control1_fragments.tsv.gz.tbi
├── GSM4572188_Control2_filtered_peak_bc_matrix.h5
├── GSM4572188_Control2_fragments.tsv.gz
├── GSM4572188_Control2_fragments.tsv.gz.tbi
├── GSM4572189_Control3_filtered_peak_bc_matrix.h5
├── GSM4572189_Control3_fragments.tsv.gz
├── GSM4572189_Control3_fragments.tsv.gz.tbi
├── GSM4572190_Control4_filtered_peak_bc_matrix.h5
├── GSM4572190_Control4_fragments.tsv.gz
├── GSM4572190_Control4_fragments.tsv.gz.tbi
├── GSM4572191_Control5_filtered_peak_bc_matrix.h5
├── GSM4572191_Control5_fragments.tsv.gz
├── GSM4572191_Control5_fragments.tsv.gz.tbi
├── GSM4572192_Control1_filtered_feature_bc_matrix.h5
├── GSM4572193_Control2_filtered_feature_bc_matrix.h5
├── GSM4572194_Control3_filtered_feature_bc_matrix.h5
├── GSM4572195_Control4_filtered_feature_bc_matrix.h5
└── GSM4572196_Control5_filtered_feature_bc_matrix.h5
signac_and_dataFormat_manipulation_demo.R
This R script takes in the peak count matrices in h5 format, infers gene activities via Signac and Seurat, and return 10x MTX format files with proper genome annotations for downstream analysis. Please note that the quality control(QC) steps are not included for simplicity, since the peaks are already filtered. More information for QC can be found in the Seurat official website/tutorials.
multiVI_integration.ipynb
This notebook takes in the mentioned 10x MTX files, organizes them into a multiome anndata structure and trains a MultiVI model at a small scale.
multiVI_integration_colab.ipynb
This notebook is an extension of the multiVI integration analysis above, where the full RNA-seq and ACAC-seq datasets are integrated using google Colab with GPU support. In addition, the cell types are annotated using scANVI's seed cell labeling protocol.
GLUE_multimodal_integration_and_scenic_GRN_inference.ipynb
This notebook takes in the mentioned scATAC-seq and scRNA-seq datasets and motif information from Jasper database, builds an integrated embedding model and infers gene regulatory network using this model and scenic package.
cTPnet_surface_protein_abundance_inference.html
This demo uses a pre-trained cTP-net transfer learning model to infer 24 surface proteins' abundance for an AML patient single-cell RNAseq dataset. It also clustered the cells and annotated the cell types using the inferred surface protein abundance. In the end, it calculates the average abundance of the 24 genes in two cell types.
sceptre_scMS_scTPnet_correlation.ipynb
This demo uses a Scanpy compatible single-cell mass spectrometry data processing pipeline(Sceptre) for processing an AML patient dataset(with FACS sorted markers for cell types). In the end, it also calculates the correlations between the scMS measured protein abundance with the Protein/RNA abundances from the cTP-net model demo above.