forked from statOmics/PSLS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08_ExperimentalDesignII_2_puromycin.Rmd
235 lines (181 loc) · 6.62 KB
/
08_ExperimentalDesignII_2_puromycin.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
title: "Experimental Design II: replication and power exercise 2"
author: "Lieven Clement & Alexandre Segers"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
# Puromycin dataset
Data on the velocity of an enzymatic reaction were obtained by Treloar (1974).
The number of counts per minute of radioactive product from the reaction was
measured as a function of substrate concentration in parts per million (ppm) and
from these counts the initial rate (or velocity) of the reaction was calculated (counts/min/min). The experiment was conducted once with the enzyme treated
with Puromycin, and once with the enzyme untreated.
Import libraries
```{r, message=FALSE, warning=FALSE}
```
# Import data
In contrast to the other datasets we have worked with so far, this dataset is
not available through a URL link. In stead, the data is directly available from
an R package that was pre-installed in your R working environment. As such, we
can simple do
```{r}
data(Puromycin)
```
and an object called `Puromycin` is immediately available in your working
environment.
# Data wrangling
Filter the data so that we are left with only the "treated" enzymes.
```{r}
```
# Data exploration
In the original publication the authors observed a linear relationship between
the log10 substrate concentration and the reaction rate. Make a visualization
that allows for reproducing this finding.
```{r, eval=FALSE}
Puromycin %>%
ggplot(...) + # select which elements of the dataset we need to visualize
... # use a relevant plotting geometry
stat_smooth(...) + # draw a smooth line through the data cloud
stat_smooth(...) # draw a straight (linear regression) line through the data cloud
... # you can add some extra elements like axis labels, title, ...
```
Note, that the researchers have chosen 6 different substrate concentrations and
conducted an experiment where they assessed the initial reaction rate twice for
every concentration.
# Question 1
Use the data to calculate the power to pick up an association that is as
least as strong as the association you observed in the dataset when using an
experiment with the same design.
## Simulation function
For this first question, we will need a function that allows us to simulate data
similar to that of our experiment under our model assumptions. We provide this
function for you:
```{r}
simFast <- function(form, data, betas, sd, contrasts, alpha = .05, nSim = 10000) {
ySim <- rnorm(nrow(data) * nSim, sd = sd)
dim(ySim) <- c(nrow(data), nSim)
design <- model.matrix(form, data)
ySim <- ySim + c(design %*% betas)
ySim <- t(ySim)
### Fitting
fitAll <- limma::lmFit(ySim, design)
### Inference
varUnscaled <- c(t(contrasts) %*% fitAll$cov.coefficients %*% contrasts)
contrasts <- fitAll$coefficients %*% contrasts
seContrasts <- varUnscaled^.5 * fitAll$sigma
tstats <- contrasts / seContrasts
pvals <- pt(abs(tstats), fitAll$df.residual, lower.tail = FALSE) * 2
return(mean(pvals < alpha))
}
```
Without going into the full code details, this function allows us to simulate
data similar to that of our experiment under our model assumptions, given the
following inputs:
- `form`: model formula for the experiment we want to simulate
- `data`: the target dataset on which we want to base our simulations on
- `betas`: the linear regression coefficients for the target dataset
- `sd`: the residual standard errors from the linear regression model fit on
the target dataset
- `contrasts`: comparison of interest, i.e. which (combination of) model
parameters we would like to assess
- `alpha`: alpha-level at which to conduct the hypothesis testing
- `nSim`: number of datasets we would like to simulate
To simulate new data based on our target dataset `Puromycin`, we will need to
fill in all the arguments to the `simFast` function.
**Hint: for the betas and sd, we will need to fit a linear model first!**
```{r, eval=FALSE}
power1 <- simFast(
form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...
)
power1
```
# Question 2
Use the data to calculate the power to pick up an association where the
reaction rate increases on average with 10 counts/min when the substrate
concentration is 10 times higher ($\beta_1=10$).
Again, we will need to fill in all the arguments to the `simFast` function.
This time, however, the question is slightly different from question 1.
Adjust the inputs accordingly!
```{r, eval=FALSE}
power2 <- simFast(
form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...
)
power2
```
**Compare your result with the one from question 1!**
# Question 3
Use the data to calculate the number of repeats you need for each
concentration to pick up an association where the reaction rate increases on
average with 10 counts/min when the substrate concentration is 10 times higher
($\beta_1=10$) with a power of at least 90%.
**Hint: this will require us to repeat the simulation process as many times**
**as we have concentration values.**
```{r, eval=FALSE}
concentrations <- Puromycin %>%
pull(conc) %>%
unique()
powers <- data.frame(n = 1:10, power = NA) # to later store our results
for (i in 1:10)
{
simData <- data.frame(conc = rep(concentrations, each = i))
powers[i, 2] <- simFast(
form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...
)
}
```
Make a visualization that displays the power in function of the number of
repeats in the data.
```{r, eval=FALSE}
powers %>%
ggplot(...) +
geom_line() +
geom_hline(yintercept = 0.9, lty = 2)
```
# Question 4
Suppose that you would set up an experiment with a design similar with the
same concentrations as in the puromycin dataset and you have the following
restriction: you need to use each concentration at least once and can setup at
most 12 reactions, how would you choose your design points? Calculate the power
for this design when the effect size is 10 counts/min per 10 times increase in
the substrate concentration ($\beta_1=10$).
```{r, eval=FALSE}
concentrations <- ...
simData <- data.frame(conc = c(
concentrations, rep(min(concentrations), 3),
rep(max(concentrations), 3)
))
powerOpt <- simFast(
form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...
)
simData
powerOpt
```
Make a visualization that displays the power in function of the number of
repeats in the data.
```{r, eval=FALSE}
```
Inspect and interpret the results.