-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
154 lines (118 loc) · 5.04 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# cluequery
The goal of cluequery is to provide an easy R interface to query Connectivity Map
data from [Clue](https://clue.io).
## Installation
You can install the current version of cluequery from
[Github](https://github.com/labsyspharm/cluequery) with:
```{r installation, eval=FALSE}
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("labsyspharm/cluequery")
```
## API key
In order to use this package an API key from Clue is required. For academice
purposes, they are freely available at [Clue](https://clue.io).
The API key can be automatically retrieved by all `cluequery` functions if it is
added to the `~/.Renviron` file:
```
CLUE_API_KEY=xxx
```
## Example
This is a basic example how to query Clue for a gene signature of interest. Gene
signatures can come from any source, but must contain a set of upregulated genes
and a set of downregulated genes.
The gene signatures must be converted into
[GMT format](https://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GMT)
for use with Clue.
### Gene signature derived from differential expression with DESeq2
In this example, we generate a gene set compatible with Clue from a
[DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
differential expression experiment.
First, we load our example DESeq2 result:
```{r load_deseq2}
library(cluequery)
deseq2_res_path <- system.file("extdata", "example_deseq2_result.csv.xz", package = "cluequery")
deseq2_res <- read.csv(deseq2_res_path)
head(deseq2_res)
```
Note that gene IDs need to be [Entrez](https://www.ncbi.nlm.nih.gov/gene/) IDs.
We need to choose a name for the gene set and decide on a
[significance cutoff (alpha)](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#p-values-and-adjusted-p-values)
for our differentially expressed genes.
```{r prepare_deseq2}
deseq2_gmt <- clue_gmt_from_deseq2(deseq2_res, name = "treatment_drug_x", alpha = 0.05)
str(deseq2_gmt)
```
A number of warnings are raised, indicating that some gene IDs are not part of
the [BING gene space](https://clue.io/connectopedia/category/Concept). These
genes are removed from the gene sets and not considered by Clue.
`clue_prepare_deseq2` returns a named vector with paths to the GMT
files for the up- and the down-regulated genes.
### Gene signature derived from pre-existing lists
We can also convert an pre-existing set of genes from any source to GMT files:
```{r pre_defined}
up_genes <- c(
"10365", "1831", "9314", "4846", "678", "22992", "3397", "26136",
"79637", "5551", "7056", "79888", "1032", "51278", "64866", "29775",
"994", "51696", "81839", "23580", "219654", "57178", "7014",
"57513", "51599", "55818", "4005", "4130", "4851", "2050", "50650",
"9469", "54438", "3628", "54922", "3691", "65981", "54820", "2261",
"2591", "7133", "162427", "10912", "8581", "2523", "25807", "9922",
"30850", "4862", "8567", "79686", "55615", "51283", "3337", "2887",
"3223", "6915", "6907", "26056", "259217", "6574", "23097", "5164",
"57493", "7071", "5450", "113146", "8650"
)
down_genes <- c(
"5128", "5046", "956", "10426", "9188", "23403", "7204", "1827",
"3491", "9076", "330", "8540", "22800", "10687", "19", "63875",
"10979", "51154", "10370", "50628", "7128", "6617", "7187", "22916",
"81034", "58516", "3096", "4794", "5202", "26511", "8767", "2355",
"22943", "1490", "133", "11010", "51025", "23160", "56902", "3981",
"5209", "6347", "5806", "7357", "9425", "3399", "6446", "64328",
"6722", "8545", "688", "861", "390", "23034", "51330", "51474",
"2633", "4609"
)
pre_gmt <- clue_gmt_from_list(up_genes, down_genes, "my_favourite_genes")
str(pre_gmt)
```
### Querying Clue
Now that we have GMT files, we can query Clue. Here we use the GMT files derived
from the DESeq2 result, but we could also use the `pre_gmt` files generated
above.
```{r query_clue}
submission_result <- clue_query_submit(
deseq2_gmt[["up"]], deseq2_gmt[["down"]], name = "deseq2_query_job", use_fast_tool = FALSE
)
submission_result$result$job_id
```
`clue_query_submit` returns a nested list containing information about the submitted
job, including the job id.
Now we have to wait for the job to finish. We can use the function `clue_query_wait`
to pause our script until the results are ready.
```{r wait_result}
clue_query_wait(submission_result, interval = 60, timeout = 600)
```
`clue_query_wait` will automatically continue execution of the script once results
are ready. Now we can download and parse the results:
```{r download_and_parse}
result_path <- clue_query_download(submission_result)
result_df <- clue_parse_result(result_path)
```
```{r echo = FALSE}
knitr::kable(head(result_df, n = 10))
```
## Funding
We gratefully acknowledge support by NIH Grant 1U54CA225088-01: Systems
Pharmacology of Therapeutic and Adverse Responses to Immune Checkpoint and Small
Molecule Drugs.