-
Notifications
You must be signed in to change notification settings - Fork 8
/
knitrDoc.Rnw
112 lines (106 loc) · 4.48 KB
/
knitrDoc.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
\documentclass{article}
\begin{document}
<<include=FALSE>>=
library(knitr)
opts_chunk$set(
concordance=TRUE
)
@
<<>>=
library(PLIER)
@
\section{Some Notes}
PLIER runs reasonably fast but to get the best performance we recommend that you use linear algebra libraries optimized for your system. For Ubuntu and Windows you can follow instructions at http://brettklamer.com/diversions/statistical/faster-blas-in-r/.
\section{Vaccination Dataset}
Load data
<<echo=TRUE>>=
data(bloodCellMarkersIRISDMAP)
data(svmMarkers)
data(canonicalPathways)
data(vacData)
@
Construct a joint pathway matrix by merging canonicalPathways, bloodCellMarkersIRISDMAP and svmMarkers and select genes appearing in both gene expression profile and the joint pathway matrix.
<<echo=TRUE>>=
allPaths=combinePaths(bloodCellMarkersIRISDMAP, svmMarkers,canonicalPathways)
cm.genes=commonRows(allPaths, vacData)
@
Normalize the data and count the number of latent variables in the data by num.pc(). The result is 24. Then set max.iter = 150, k = 24 and all other parameters to be default. Note: num.pc() is quite conservative and there is no harm in setting this value higher.
<<echo=TRUE>>=
vacDataN=rowNorm(vacData)
num.pc(vacDataN[cm.genes,])
#plierResult=PLIER(vacDataN[cm.genes,], allPaths[cm.genes,],k=24, trace=T, max.iter=150)
@
In the interest of speed we provide a pre-computed version. (Note that for large datasets, with more than 500 samples, the results are not always exactly the same due to a randomized svd used for internal calculations.) If randomized svd is used a warning will be issues.
<<echo=TRUE>>=
data(plierResult)
@
We correlate the decomposition result with SPVs from CellCODE. We have nice one-to-one correspondence, though the "DendriticCell" signature from CellCODE is more closely related to the Type-I interferon transcriptional response so it is probably not cell-type induced variation.
<<echo=TRUE>>=
data(SPVs)
plotMat(cor(t(plierResult$B), SPVs))
@
Visualize the U matrix with default cutoffs
<<echo=TRUE>>=
plotU(plierResult, auc.cutoff = 0.75, pval.cutoff = 0.01, top = 3)
@
Visualize the U matrix with more permissive cutoffs
<<echo=TRUE>>=
plotU(plierResult, auc.cutoff = 0.6, pval.cutoff = 0.05, top = 3)
@
Visualize the top genes
<<echo=TRUE>>=
plotTopZ(plierResult, vacDataN, allPaths, top = 5)
@
The "PID\_ATF2\_PATHWAY" and "REACTOME\_MRNA\_PROCESSING" looks a little tenuous and we can check the statistics.
<<echo=TRUE>>=
plierResult$summary[which(plierResult$summary$`LV index`==3),]
plierResult$summary[which(plierResult$summary$`LV index`==4),]
@
These pathways are not perfectly associated with LVs but are significant. This is typical for transcriptional pathways. To get a closer look at the genes we can visualize just these
<<echo=TRUE>>=
plotTopZ(plierResult, vacDataN, allPaths, index=c(3,4), top=25)
@
\section{HCC Dataset}
Load data
<<echo=TRUE>>=
data(HCCdata)
data(canonicalPathways)
data(chemgenPathways)
data(oncogenicPathways)
@
Construct a joint pathway matrix by merging canonicalPathways, chemgenPathways and oncogenicPathways and select genes appearing in both gene expression profile and the joint pathway matrix.
<<echo=TRUE>>=
CancerPath=combinePaths(canonicalPathways, chemgenPathways, oncogenicPathways)
cmHCC=commonRows(HCCdata, CancerPath)
@
Remove small pathways, not strictly necessary but saves computation time by making the pathway/geneset matrix smaller
<<echo=TRUE>>=
ii=which(colSums(CancerPath[cmHCC,])<20)
HCCpath=CancerPath[, -ii]
@
Prescale the data
<<echo=TRUE>>=
HCCdataN=rowNorm(HCCdata[cmHCC,])
@
Pre-compute Chat, which is used to define active pathways and is expensive for large pathway sets. It's helpful if we want to run PLIER with different parameters
<<echo=TRUE>>=
HCCchat=computeChat(CancerPath[cmHCC,])
@
Compute the number of latent variables by num.pc(HCCdataN) and the result is 52. Then set k = 52 and all other parameters to be default.
<<echo=TRUE>>=
#plierResultHCC=PLIER(HCCdataN, CancerPath[cmHCC,], k = 52, Chat = HCCchat, trace=T)
@
Again, in the interest of speed we provide a pre-computed version
<<echo=TRUE>>=
data(plierResultHCC)
@
Plot the result with a high AUC cutoff so it is not too busy
<<echo=TRUE>>=
plotU(plierResultHCC, auc.cutoff = 0.85, top=5)
@
We found two immune components, interferon alpha (LV26) and genes related to interferon gamma/CD8/Th1 response (LV4).
<<echo=TRUE>>=
plotTopZ(plierResultHCC, HCCdataN, CancerPath, index=c(26, 40), top = 20)
@
LV40 has many associated pathways, it is named with the top pathway but "inPathway" refers to their union.
\end{document}