Single Cell RNA-Seq imputAtion constrained By BuLk RNAsEq data (SCRABBLE)
SCRABBLE imputes drop-out data by optimizing an objective function that consists of three terms. The first term ensures that imputed values for genes with nonzero expression remain as close to their original values as possible, thus minimizing unwanted bias towards expressed genes. The second term ensures the rank of the imputed data matrix to be as small as possible. The rationale is that we only expect a limited number of distinct cell types in the samples. The third term operates on the bulk RNA-Seq data. It ensures consistency between the average gene expression of the aggregated imputed data and the average gene expression of the bulk RNA-Seq data. We developed a convex optimization algorithm to minimize the objective function.
install.packages("SCRABBLE")
library(devtools)
install_github("software-github/SCRABBLE/R")
Download source codes here and In R type:
install.packages(path_to_file, type = 'source', rep = NULL)
Where path_to_file
would represent the full path and file name:
- On Windows it will look something like this: "C:\Downloads\SCRABBLE.tar.gz".
- On UNIX it will look like this: "~/Downloads/SCRABBLE.tar.gz".
data_sc <- demo_data[[1]]
data_bulk <- demo_data[[2]]
data_true <- demo_data[[3]]
parameter <- c(1,1e-6,1e-4)
result <- scrabble(demo_data, parameter = parameter)
There are three datasets in the .mat file. There are the true data set, Drop-out data set, and the imputed data set by SCRABBLE.
load('demo_data.mat')
We construct the data structure which is taken as one of the input of SCRABBLE.
data.data_sc = data_sc;
data.data_bulk = data_bulk;
Set up the parameters used in example
parameter = [1,1e-6,1e-4];
dataRecovered = scrabble(data,parameter);
gcf = figure(1);
set(gcf, 'Position', [100, 500, 1200, 300])
subplot(1,3,1)
imagesc(log10(data_true+1))
title('True Data')
axis off
subplot(1,3,2)
imagesc(log10(data_sc+1))
title('Drop-out Data')
axis off
subplot(1,3,3)
imagesc(log10(dataRecovered+1))
title('Imputed Data by SCRABBLE')
axis off
Please feel free to contact Tao Peng (software.github@gmail.com) if you have any questions about the software.
Peng, Tao, et al. "SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data." Genome biology 20.1 (2019): 88.