scDCC -- Single Cell Deep Constrained Clustering

Clustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.

Network diagram

Requirements

Python --- 3.6.8
pytorch -- 1.5.1+cu101 (https://pytorch.org)
Scanpy --- 1.0.4 (https://scanpy.readthedocs.io/en/stable)
Nvidia Tesla P100

Usage

python scDCC_pairwise_CITE_PBMC.py

python scDCC_pairwise_Human_liver.py

Parameters

--n_clusters: number of clusters
--n_pairwise: number of pairwise constraints want to generate
--gamma: weight of clustering loss
--ml_weight: weight of must-link loss
--cl_weight: weight of cannot-link loss

Files

scDCC.py -- implementation of scDCC algorithm

scDCC_pairwise.py -- the wrapper to run scDCC on the datasets in Figure 2-4

scDCC_pairwise_CITE_PBMC.py -- the wrapper to run scDCC on the 10X CITE PBMC dataset (Figure 5)

scDCC_pairwise_Human_liver.py -- the wrapper to run scDCC on the human liver dataset (Figure 6)

In the folder scDCC_estimating_number_of_clusters I implement a version of scDCC that can be using for general datasets without knowning number of clusters.

Datasets

Datasets used in the study is available in: https://figshare.com/articles/dataset/scDCC_data/21563517

Reference

Tian, T., Zhang, J., Lin, X., Wei, Z., & Hakonarson, H. (2021). Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data. Nature communications, 12(1), 1873. https://doi.org/10.1038/s41467-021-22008-3.

Contact

Tian Tian tiantianwhu@163.com

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
pretrained_weights		pretrained_weights
scDCC_estimating_number_of_clusters		scDCC_estimating_number_of_clusters
LICENSE.md		LICENSE.md
README.md		README.md
Tutorial_CITE_PBMC.ipynb		Tutorial_CITE_PBMC.ipynb
image.png		image.png
layers.py		layers.py
preprocess.py		preprocess.py
scDCC.py		scDCC.py
scDCC_latent.py		scDCC_latent.py
scDCC_pairwise.py		scDCC_pairwise.py
scDCC_pairwise_CITE_PBMC.py		scDCC_pairwise_CITE_PBMC.py
scDCC_pairwise_Human_liver.py		scDCC_pairwise_Human_liver.py
scDeepCluster.py		scDeepCluster.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scDCC -- Single Cell Deep Constrained Clustering

Table of contents

Network diagram

Requirements

Usage

Parameters

Files

Datasets

Reference

Contact

About

Releases 1

Packages

Languages

License

ttgump/scDCC

Folders and files

Latest commit

History

Repository files navigation

scDCC -- Single Cell Deep Constrained Clustering

Table of contents

Network diagram

Requirements

Usage

Parameters

Files

Datasets

Reference

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages