forked from statOmics/PSLS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08_3_puromycin_sol.Rmd
322 lines (235 loc) · 9.72 KB
/
08_3_puromycin_sol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: "8.3. Multiple regression with interaction: puromycin example - solution"
author: "Lieven Clement, Jeroen Gilis and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
```{r, message=FALSE, warning=FALSE}
library(tidyverse)
theme_set(theme_bw())
```
# Puromycin data
Data on the velocity of an enzymatic reaction were obtained by Treloar (1974).
The number of counts per minute of radioactive product from the reaction was
measured as a function of substrate concentration in parts per million (ppm) and
from these counts the initial rate (or velocity) of the reaction was calculated (counts/min/min). The experiment was conducted once with the enzyme treated
with Puromycin, and once with the enzyme untreated.
Assess if there is an association between the substrate concentration and rate
**for both the treated and untreated enzymes.**
# Import data
```{r}
data(Puromycin)
```
# Data wrangling
For a clearer interpretation of the model parameters later on, we will make
the untreated state enzymes the reference category.
```{r}
Puromycin <- Puromycin %>%
mutate(state = fct_relevel(state, c("untreated", "treated")))
```
## Data Exploration
First, we visualize the association between the concentration and the enzyme
rate, for both of the enzyme states.
```{r}
Puromycin %>%
ggplot(aes(conc, rate)) +
geom_point(aes(col = state)) +
stat_smooth(method = "lm") +
stat_smooth(color = "firebrick", se = FALSE) +
labs(
x = "Substrate concentration (ppm)",
y = "Reaction rate (counts/min/min)"
) +
ggtitle("Reaction rates for puromycin-treated enzymes")
```
The plot shows that there is a relation between the velocity and the
concentration, however, the relation does not seem to be linear.
We will assess the impact of log-transforming the concentration. Because the
concentration is measured in ppm we will log$_{10}$ transform the data.
```{r}
Puromycin %>%
ggplot(aes(x = log10(conc), y = rate)) +
geom_point(aes(col = state)) +
stat_smooth(method = "lm") +
stat_smooth(color = "firebrick", se = FALSE) +
labs(
x = "Substrate concentration (log10(ppm))",
y = "Reaction rate (counts/min/min)"
) +
ggtitle("Reaction rates for puromycin-treated enzymes")
```
The relation between the velocity and the log$_{10}$ transformed concentration
seems to be linear.
# Linear regression
We will fit the following model to the data
$Y_i = \beta_0 + \beta_c x_c+ \beta_s x_s +\beta_{c:s}x_{c}x_{s} + \epsilon_i$
with
- $Y_i$ the reaction rate,
- $\beta_0$ the intercept,
- $\beta_{c}$ the main effect for log10 concentration,
- $x_c$ the log10 concentration,
- $\beta_{p}$ the main effect for treatment ,
- $x_s$ a dummy variable for "state" that is 0 if the enzymes that are untreated and 1 if
the enzymes are treated with Puromycin,
- $\beta_{c:s}$ the interaction effect between concentration and treatment state,
- $\epsilon_i$ i.i.d. normally distributed with mean 0 and variance $\sigma^2$.
Note, that we write the substrate concentration with a small letter because
the predictor is not random. The researchers have chosen the substrate
concentrations in the design phase and it is thus no random variable.
The model implies two different regression lines
- no treatment ($x_s = 0$)
$$
Y_i = \beta_0 + \beta_c x_c + \epsilon
$$
- treatment (x_s = 1)
$$
Y_i = (\beta_0 + \beta_s) + (\beta_c+\beta_{c:s}) x_c + \epsilon
$$
So the main effect for treatment has the interpretation as the change in intercept between treated and untreated samples.
The interaction term has the interpretation as the change in slope between treated and untreated samples.
```{r}
Puromycin <- Puromycin %>%
mutate(log10conc = log10(conc))
mod1 <- lm(rate ~ log10conc * state, Puromycin)
summary(mod1)
```
Before we perform inference we will first assess the assumptions
## Assumptions
1. Independence
2. Linearity
3. Normal distribution of the residuals
4. Homoscedasticity
We assume that the experiment was well designed and that the different reactions that were use in the experiment are independent.
### Linearity
We assess linearity in a residual analysis
```{r}
plot(mod1, which = 1)
```
The assumption of linearity is met.
### Normality
```{r}
plot(mod1, which = 2)
```
The QQ-plot does not show large deviations from normality.
### Homoscedasticity: equality of the variance
We can again use the residual plot for assessing this assumption or the plot
were we plot the square root of the standardized residuals in function of
the fit.
```{r}
plot(mod1, which = 3)
```
We see that the spread of the majority of the residuals is more or less similar.
As such, we may assume homoscedasticity of the data.
## Intermezzo: Graphical interpretation of the parameters
### Simple linear model: 1 slope {-}
This is the same model that we fit in the
[Chapter 6 exercise](./06_1_puromycin.html):
$$ Y_i = \beta_0 + \beta_{\text{rate}} X_{i, \text{rate}} + \epsilon_i $$
```{r}
mod_simple <- lm(rate ~ log10conc, data = Puromycin)
mod_simple
ggplot(Puromycin, aes(log10conc, rate)) +
geom_point(aes(color = state)) +
geom_abline(
intercept = coef(mod_simple)[1],
slope = coef(mod_simple)[2]
) +
scale_color_manual(values = c("darkorchid", "forestgreen")) +
ggtitle("The simple linear model")
```
### Additive model: 2 parallel slopes {-}
We add an additional term for the state of the reactin (treated or untreated).
Note that this variable is **categorical**.
$$
Y_i = \beta_0 + \beta_{\text{rate}} X_{i, \text{rate}} +
\beta_{\text{state}} X_{i, \text{state}} + \epsilon_i
$$
```{r}
mod_add <- lm(rate ~ log10conc + state, data = Puromycin)
mod_add
ggplot(Puromycin, aes(log10conc, rate, col = state)) +
geom_point() +
## Line for the untreated group
geom_abline(
intercept = coef(mod_add)[1],
slope = coef(mod_add)[2],
col = "darkorchid"
) +
## Line for the treated group
geom_abline(
intercept = coef(mod_add)[1] + coef(mod_add)[3],
slope = coef(mod_add)[2],
col = "forestgreen"
) +
scale_color_manual(values = c("darkorchid", "forestgreen")) +
ggtitle("The additive linear model with parallel slopes")
```
### The interaction model: 2 non-parallel slopes {-}
$$
Y_i = \beta_0 + \beta_{\text{rate}} X_{i, \text{rate}} +
\beta_{\text{state}} X_{i, \text{state}} +
\beta_{\text{rate:state}} X_{i, \text{rate}} X_{i, \text{state}} +
\epsilon_i
$$
```{r}
mod_int <- lm(rate ~ log10conc * state, data = Puromycin)
mod_int
ggplot(Puromycin, aes(log10conc, rate, col = state)) +
geom_point() +
## Line for the untreated group
geom_abline(
intercept = coef(mod_int)[1],
slope = coef(mod_int)[2],
col = "darkorchid"
) +
## Line for the treated group
geom_abline(
intercept = coef(mod_int)[1] + coef(mod_int)[3],
slope = coef(mod_int)[2] + coef(mod_int)[4],
col = "forestgreen"
) +
scale_color_manual(values = c("darkorchid", "forestgreen")) +
ggtitle("The linear model with interaction: non-parallel slopes")
```
## Inference
We first do an omnibus test to assess is there is an effect of the log10 concentration on the velocity.
```{r}
mod0 <- lm(rate ~ state, data = Puromycin)
anova(mod0, mod1)
```
Next, we assess the interaction.
```{r}
anova(mod1)
```
We cannot remove the interaction of the model.
Hence, we cannot study the effect of the concentration without accounting for the treatment and have to assess following research questions.
1. the association between velocity and the concentration is significant in the untreated group
$$
H_0: \beta_c = 0 \text{ vs } H_1: \beta_c \neq 0
$$
2. the association between velocity and the concentration is significant in the treated group
$$
H_0: \beta_c + \beta_{c:s}= 0 \text { vs } H_1: \beta_c + \beta_{c:s}\neq 0
$$
3. the association between velocity and the concentration is different between treated and untreated group
$$
H_0: \beta_{c:s}= 0\text{ vs }H_1: \beta_{c:s}\neq 0
$$
We can assess all these hypotheses using multcomp while correcting for multiple testing.
```{r}
library(multcomp)
mcp1 <- glht(mod1,
linfct = c(
"log10conc = 0",
"log10conc + log10conc:statetreated = 0",
"log10conc:statetreated = 0"
)
)
summary(mcp1)
confint(mcp1)
```
## Conclusion
There is an extremely significant effect of the substrate concentration on the reaction rate (p<<0.001).
The effect of the substrate concentration on the reaction rate is extremely significant for reactions catalysed with untreated enzymes. A reaction at a substrate concentration that is 10 times higher will have a reaction speed that is on average `r round(confint(mcp1)$confint[1,1],1)` counts/min higher (95% CI [`r round(confint(mcp1)$confint[1,-1],1)`] counts/min) (p << 0.001).
The effect of the substrate concentration on the reaction rate is extremely significant for reactions catalysed puromycin treated enzymes (p << 0.001). A reaction at a substrate concentration that is 10 times higher will have a reaction speed that is on average `r round(confint(mcp1)$confint[2,1],1)` counts/min higher (95% CI [`r round(confint(mcp1)$confint[2,-1],1)`] counts/min).
The effect of the substrate concentration on the reaction rate is very significantly higher for reactions catalysed with Puromycin treated enzymes than when catalysed with non-treated enzymes (p = `r round(summary(mcp1)$test$pvalue[3],3)`). A reaction at a substrate concentration that is 10 times higher will have a reaction speed that is on average `r round(confint(mcp1)$confint[3,1],1)` counts/min higher for reactions that are catalysed with Puromycin treated enzymes than with untreated enzymes (95% CI [`r round(confint(mcp1)$confint[3,-1],1)`] counts/min).