forked from statOmics/PSLS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08_ExperimentalDesignII_2_puromycin_sol.Rmd
186 lines (139 loc) · 6.15 KB
/
08_ExperimentalDesignII_2_puromycin_sol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: "Experimental Design II: replication and power exercise 2 - Solution"
author: "Lieven Clement, Alexandre Segers and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
```{r}
library(tidyverse)
```
# Puromycin data
Data on the velocity of an enzymatic reaction were obtained by Treloar (1974).
The number of counts per minute of radioactive product from the reaction was measured as a function of substrate concentration in parts per million (ppm) and from these counts the initial rate (or velocity) of the reaction was calculated (counts/min/min). The experiment was conducted once with the enzyme treated with Puromycin, and once with the enzyme untreated.
Here, we will focus again on the enzyme treated data.
```{r}
data(Puromycin)
Puromycin <- Puromycin %>%
filter(state == "treated")
```
There was a linear association between the log10 substrate concentration and the reaction rate
```{r}
Puromycin %>%
ggplot(aes(x = conc %>% log10(), y = rate)) +
geom_point() +
stat_smooth(method = "loess", col = "red") +
stat_smooth(method = "lm", col = "black") +
ylab("Reaction Rate (counts/min)") +
xlab("log10(Substrate concentration) (log10 ppm)")
```
Note, that the researchers have chosen 6 different substrate concentrations and conducted an experiment where they assessed the initial reaction rate twice for every concentration.
1. Use the data to calculate the power to pick up an association that is as least as strong as the association you observed in the dataset when using an experiment with the same design.
2. Use the data to calculate the power to pick up an association where the reaction rate increases on average with 10 counts/min when the substrate concentration is 10 times higher ($\beta_1=10$).
3. Use the data to calculate the number of repeats you need for each concentration to pick up an association where the reaction rate increases on average with 10 counts/min when the substrate concentration is 10 times higher with a power of at least 90%. ($\beta_1=10$)
4. Suppose that you would setup an experiment with a design similar with the same concentrations as in the puromycin dataset and you have the following restriction: you need to use each concentration at least once and can setup at most 12 reactions, how would you choose your design points? Calculate the power for this design when the effect size is 10 counts/min per 10 times increase in the substrate concentration ($\beta_1=10$).
## Simulation function
Function to simulate data similar to that of our experiment under our model assumptions.
```{r}
simFast <- function(form, data, betas, sd, contrasts, alpha = .05, nSim = 10000) {
ySim <- rnorm(nrow(data) * nSim, sd = sd)
dim(ySim) <- c(nrow(data), nSim)
design <- model.matrix(form, data)
ySim <- ySim + c(design %*% betas)
ySim <- t(ySim)
### Fitting
fitAll <- limma::lmFit(ySim, design)
### Inference
varUnscaled <- c(t(contrasts) %*% fitAll$cov.coefficients %*% contrasts)
contrasts <- fitAll$coefficients %*% contrasts
seContrasts <- varUnscaled^.5 * fitAll$sigma
tstats <- contrasts / seContrasts
pvals <- pt(abs(tstats), fitAll$df.residual, lower.tail = FALSE) * 2
return(mean(pvals < alpha))
}
```
## Power to pick up the same effect size as we observed in the data set with the same design
```{r}
mod1 <- lm(rate ~ conc %>% log10(), Puromycin)
betas <- mod1$coefficients
nSim <- 10000
form <- ~ conc %>% log10()
sd <- sigma(mod1)
contrast <- matrix(c(0, 1), ncol = 1)
rownames(contrast) <- names(mod1$coefficients)
alpha <- 0.05
power <- simFast(form, Puromycin, betas, sd, contrasts = contrast, alpha = alpha, nSim = nSim)
power
```
## Power for $\beta_1=10$
```{r}
mod1 <- lm(rate ~ conc %>% log10(), Puromycin)
betas <- mod1$coefficients
betas[2] <- 10
nSim <- 10000
form <- ~ conc %>% log10()
sd <- sigma(mod1)
contrast <- matrix(c(0, 1), ncol = 1)
rownames(contrast) <- names(mod1$coefficients)
alpha <- 0.05
power <- simFast(form, Puromycin, betas, sd, contrasts = contrast, alpha = alpha, nSim = nSim)
power
```
The power to pick up a slope of $\beta_1=10$ for this experiment is only
`r round(power*100,1)`%.
## Calculate the number of repeats needed per concentration to obtain a power of 90% to pick up an effect of $\beta=10$.
```{r}
mod1 <- lm(rate ~ conc %>% log10(), Puromycin)
concentrations <- Puromycin %>%
pull(conc) %>%
unique()
betas <- mod1$coefficients
betas[2] <- 10
nSim <- 10000
form <- ~ conc %>% log10()
sd <- sigma(mod1)
contrast <- matrix(c(0, 1), ncol = 1)
rownames(contrast) <- names(mod1$coefficients)
alpha <- 0.05
powers <- data.frame(n = 1:10, power = NA)
for (i in 1:10)
{
simData <- data.frame(conc = rep(concentrations, each = i))
powers[i, 2] <- simFast(form, simData, betas, sd, contrasts = contrast, alpha = alpha, nSim = nSim)
}
powers %>%
ggplot(aes(n, power)) +
geom_line() +
geom_hline(yintercept = .9, lty = 2)
```
We need `r min(which(powers$power>0.9))` repeats for each concentration to
obtain a power above 90%.
# Optimal design with 12 reactions
```{r}
concentrations <- Puromycin %>%
pull(conc) %>%
unique()
betas <- mod1$coefficients
betas[2] <- 10
nSim <- 10000
form <- ~ conc %>% log10()
sd <- sigma(mod1)
contrast <- matrix(c(0, 1), ncol = 1)
rownames(contrast) <- names(mod1$coefficients)
alpha <- 0.05
simData <- data.frame(conc = c(concentrations, rep(min(concentrations), 3), rep(max(concentrations), 3)))
powerOpt <- simFast(form, simData, betas, sd, contrasts = contrast, alpha = alpha, nSim = nSim)
simData
powerOpt
```
Note that the power for a design where we repeat each concentration 1 time and
the minimum and maximum concentration 4 times is considerably higher than that
for the designs where we repeat all data points.
```{r}
powers %>%
ggplot(aes(n, power)) +
geom_line() +
geom_hline(yintercept = powerOpt, lty = 2)
```
Indeed, the power for our optimal design with 12 reactions is as high as the
power for an experiment where you would repeat every concentration 3 times for
which we need to conduct 18 reactions!