-
Notifications
You must be signed in to change notification settings - Fork 22
/
README.Rmd
104 lines (79 loc) · 5.14 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
#<a href="url"><img src="http://bioinformatics.victorchang.edu.au/projects/cidr/images/cidr_logo.png" align="left" height="96" alt="CIDR"></a>
#Clustering through Imputation and Dimensionality Reduction
Ultrafast and accurate clustering through imputation and dimensionality
reduction for single-cell RNA-seq data.
Most existing dimensionality reduction and clustering packages for single-cell RNA-Seq (scRNA-Seq) data deal with dropouts by heavy modelling and computational machinery. Here we introduce _CIDR_ (Clustering through Imputation and Dimensionality Reduction), an ultrafast
algorithm which uses a novel yet very simple ‘implicit imputation’ approach to alleviate the
impact of dropouts in scRNA-Seq data in a principled manner.
For more details about _CIDR_, refer to the [paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1188-0):
Peijie Lin, Michael Troup, Joshua W.K. Ho, CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. _Genome Biology_ 2017 Mar 28;18(1):59.
_CIDR_ is maintained by Dr Joshua Ho j.ho@victorchang.edu.au.
##Getting Started
* Make sure your version of R is at least 3.1.0.
* _CIDR_ has been tested primarily on the Linux and Mac platforms. _CIDR_ has also been tested on the Windows platform - however this requires the use of an external software package _Rtools_.
* If you are on the Windows platorm, ensure that [Rtools](https://cran.r-project.org/bin/windows/Rtools/) is installed. Rtools is software (installed external to R) that assists in building R packages, and R itself. Note that the downlaod for _Rtools_ is in the order of 100M.
* Install the CRAN package _devtools_ package which will be used to install _CIDR_ and its dependencies:
```{r, eval=F}
## this is an R command
install.packages("devtools")
```
* Install the _CIDR_ package directly from the Github repository (including any dependencies):
```{r, eval=F}
## this is an R command
devtools::install_github("VCCRI/CIDR")
## Note that for some Windows platforms, you may be asked to re-install RTools
## - even though it may already have been installed. Say yes if prompted.
## Your windows platform may require the specific version of RTools being suggested.
##
## For Mac platforms, ensure that the software "Xcode" and "Command Line Tools" are
## installed, by issuing the following command from a terminal prompt:
## /usr/bin/clang --version
##
```
#Examples
##Simulated Data
Test the newly installed _CIDR_ package:
```{r}
library(cidr)
example("cidr")
```
##Biological Datasets
Examples of applying _CIDR_ to real biological datasets can be found at this [Github repository](https://github.com/VCCRI/CIDR-examples). The name of the repository is _CIDR-examples_.
Clicking on the _Clone or Download_ button in the Github repository for _CIDR-examples_ will enable the user to download a zip file containing the raw biological data and the R files for the examples. The user can then extract the files and run the provided R examples.
###Human Brain scRNA-Seq Dataset
_CIDR-examples_ contains a human brain single-cell RNA-Seq dataset, located in the _Brain_ folder. In this dataset
there are 420 cells in 8 cell types after we exclude hybrid cells.
Reference for the human brain dataset:
Darmanis, S. _et al._ A survey of human brain transcriptome diversity at the single cell level.
_Proceedings of the National Academy of Sciences_ 112, 7285–7290 (2015).
###Human Pancreatic Islet scRNA-Seq Dataset
_CIDR-examples_ contains a human pancreatic islet single-cell RNA-Seq dataset, located in the _PancreaticIslet_ folder. In this dataset there are 60 cells in 6 cell types after we exclude undefined cells and bulk RNA-Seq samples.
Reference for the human pancreatic islet dataset:
Li, J. _et al._ Single-cell transcriptomes reveal characteristic features of human pancreatic islet
cell types. _EMBO Reports_ 17, 178–187 (2016).
##Troubleshooting
###Masking of _hclust_
_CIDR_ utilises the _hclust_ function from the base _stats_ package. Loading _CIDR_ masks _hclust_ in other packages automatically.
However, if any package with an _hclust_ function (e.g., _flashClust_) is loaded after _CIDR_, the name clashing can possibly cause a problem.
In this case unloading that package should resolve the issue.
###Reinstallation of _CIDR_ - cidr.rdb corruption
In some cases when installing a new version of _CIDR_ on top of an existing version may result in the following error message:
```Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/cidr/help/cidr.rdb' is corrupt```
In this case, one way to resolve this issue is to reinstall the _devtools_ package:
```{r, eval=F}
install.packages("devtools")
## Click “Yes” in “Updating Loaded Packages”
devtools::install_github("VCCRI/CIDR",force=TRUE)
```
Some users might have installed an older version of RcppEigen. CIDR requires RcppEigen version >=0.3.2.9.0. Please re-install the latest version of this package if necessary.