A R-based tool to do the automatic identification of co-expressed genes across mulitple single cell RNA-seq datasets simultaneously
scLM works with multiple single cell RNA-seq dataset as inputs. It also works with one single cell dataset. Bascially, the format looks like the following. Example data files can be found in the Data
CellID | Cell1ID | Cell2ID | Cell3ID | Cell4ID | ... |
Gene1 | 12 | 0 | 0 | 0 | ... |
Gene2 | 125 | 0 | 298 | 0 | ... |
Gene3 | 0 | 0 | 0 | 0 | ... |
... | ... | ... | ... | ... | ... |
The gourd truth labels for cells in each dataset can also be input. The format is as following
Cell1ID | Lable1 |
Cell2ID | Lable2 |
Cell3ID | Lable3 |
Cell4ID | Lable4 |
... | .. |
recommend the linux system to run the codes
#' load the example data
# or
# data("example2")
In this example, we define 3 co-expression clusters for each dataset of the input list example1 and example2
# data(example2.member)
result1 <- Multi_NB(datalist=example1, K=3, N=nrow(example1[[1]]))
result1 contains the latent variables accompanied with other coefficients, the identified co-expression clusters
# calculate the Adjusted Rand Index
can be found using the example.R script
for (lambda in 1:20)
results <- Multi_NB(datalist=example1, K=lambda, N=nrow(example1[[1]]))
Output results in a designated path "path1"
files <- list.files(path=path1, pattern='results.RData')
# load all the results in the resA with the list structure
count <- 0
resA <- vector('list')
for ( i in files){
count <- count +1
resA[[count]] <- results
names(resA[[count]]) <- paste0('res_',strsplit(strsplit(i,'_')[[1]][4],'results')[[1]][1])}
# identify the result with least BIC value
bicS <- lapply(1:length(resA),function(i){ Res <- resA[[i]][[1]]$BIC })
optimal.lambda <- grep(min(unlist(bicS)),unlist(bicS))
# optimal co-expression clusters
opitmal.cluster <- results$clusters
Please cite our paper if you use this code in your own work:
Song, Q., Su, J., Miller, L.D. and Zhang, W., 2021. scLM: automatic detection of consensus gene clusters across multiple single-cell datasets. Genomics, Proteomics & Bioinformatics, 19(2), pp.330-341.