forked from statOmics/PSLS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08_6_rats_sol.Rmd
362 lines (279 loc) · 14.8 KB
/
08_6_rats_sol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
---
title: "Exercise 8.6: Blocking on the rat diet dataset - solution"
author: "Lieven Clement, Jeroen Gilis and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# Background
Researchers are studying the impact of protein sources and protein levels in
the diet on the weight of rats. They feed the rats with diets of beef, cereal
and pork and use a low and high protein level for each diet type.
The researchers can include 60 rats in the experiment. Prior to the experiment,
the rats were divided in 10 homogeneous groups of 6 rats based on
characteristics such as initial weight, appetite, etc.
Within each group a rat is randomly assigned to a diet. The rats are fed during
a month and the weight gain in grams is recorded for each rat.
# Experimental design
- There are three explanatory variables in the experiment: the factor diet type
with three levels (beef, cereal and pork), factor protein level with levels
(low and high) and a group blocking factor with 10 levels.
- There are 6 treatments: beef-high, cereal-high, pork-high, beef-low,
cereal-low, pork-low protein.
- The rats are the experimental and observational units.
- The weight gain is the response variable.
- The experiment is a randomized complete block (RCB) design
Load libraries
```{r, message=FALSE, warning=FALSE}
library(tidyverse)
```
# Data import
```{r}
diet <- read.table("https://raw.githubusercontent.com/statOmics/PSLSData/main/dietRats.txt",
header = TRUE
)
head(diet)
```
# Tidy data
```{r}
diet <- diet %>%
## Convert categorical levels to factors
mutate(
block = as.factor(block),
protSource = as.factor(protSource),
protLevel = as.factor(protLevel)
) %>%
## Recode factors to make them more verbose
mutate(
protLevel = fct_recode(protLevel, high = "h", low = "l"),
protSource = fct_recode(protSource, beef = "b", cereal = "c", pork = "p")
) %>%
## Set "low" as the reference level for `protLevel`
mutate(protLevel = fct_relevel(protLevel, "low"))
head(diet)
```
# Data exploration
- Boxplot of the weight gain against protein source, protein level with coloring according to block
```{r}
diet %>%
ggplot(aes(x = protLevel, y = weightGain)) +
scale_fill_brewer(palette = "RdGy") +
theme_bw() +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(color = block)) +
ggtitle("Weight gain dependence on protein source") +
labs(x = "Protein level", y = "Weight gain (g)") +
stat_summary(
fun = mean, geom="point",
shape=5, size=3, color="black"
) +
facet_wrap(vars(protSource))
```
- Lineplot of the weight gain against protein source, protein level with coloring and grouping according to block
```{r}
diet %>%
ggplot(aes(x = protLevel, y = weightGain, color = block)) +
scale_fill_brewer(palette = "RdGy") +
theme_bw() +
geom_line(aes(group = block)) +
geom_point() +
ggtitle("Weight gain dependence on protein source") +
ylab("Weight gain (g)") +
facet_wrap(vars(protSource))
```
- An increase in the weight of the rats seems to depend on the protein source
received in the diet.
- The increase in the weight of the rats seems to depend on the level of protein
received in the diet
- There also seems to be an interaction effect between the protein level and the
protein source on the gain in weight of the rats. For the beef and the pork
diets the effect of high protein levels in the data seems to be much stronger
than in the cereal diet.
- There is also a strong effect of the block. Blocking implies a randomisation
restriction, hence, we will have to include the block effect anyway.
# Multivariate linear regression analysis
## Assumptions
List assumptions:
1. The observations are independent
2. Linearity between the response and predictor variable
3. The residuals of the model must be normally distributed
4. Homoscedasticity of the data
The first assumption is met if we correct for block in the model because the rats were randomized to the treatment within block. The other three assumptions can be assessed by
fitting the linear model and calling the `plot()` function as follows.
```{r}
lm1 <- lm(weightGain ~ block + protSource * protLevel, data = diet)
plot(lm1)
```
All assumptions are met for this dataset.
## Hypothesis testing
We here fit a linear model with a blocking factor for `block` and main and
interaction effects for protein source and protein level.
```{r}
summary(lm1)
```
## Interpretation of the regression parameters
```{r}
library(ExploreModelMatrix)
ExploreModelMatrix::VisualizeDesign(diet, ~ block + protSource * protLevel)$plotlist
```
There are 3 levels for protein source, 2 levels for protein level and 10 levels
for the blocking variable. We will have one reference level for each respective
variable: beef, low, block 1. So we need 2, 1 and 9 dummy variables to introduce
the factors protein source, protein level and block in the linear model,
respectively.
Hence, we can write down the linear model as follows:
$y_i=\beta_0+\beta_cx_{i,c} +\beta_px_{i,p}+\beta_hx_{i,h}+\beta_{ch}x_{i,c}x_{i,h}+\beta_{ph}x_{i,p}x_{i,h}+\beta_{b2}x_{i,b2}+\ldots+\beta_{b10}x_{i,b10}+\epsilon_{i}$
with:
$y_i$ the observed weight gain for rat i,
$x_{i,h}$ a dummy variable which is 1 if rat i receives a high protein diet
and is 0 otherwise,
$x_{i,c}$ a dummy variable which is 1 if rat i receives a cereal diet
and is 0 otherwise,
$x_{i,p}$ a dummy variable which is 1 if rat i receives a pork diet
and is 0 otherwise and
$x_{i,bk}$ is a dummy variable which is 1 if rat i belongs to block $bk$
and is 0 otherwise, with $k \in 2,\ldots, 10$,
and $\epsilon_i$ an error term which is normally distributed with mean 0 and
variance $\sigma^2$, i.e. $\epsilon_i \sim N(0,\sigma^2)$.
- Rats that are assigned to block $k$ and receive a beef based low protein diet have
a covariate pattern $x_{i,h}=0$, $x_{i,c}=0$, $x_{i,p}=0$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
Their mean weight gain is thus equal to $\mu_{l,b,bk}=\beta_0+\beta_{bk}$
- Rats that are assigned to block $k$ and receive a beef based high protein diet have
a covariate pattern $x_{i,h}=1$, $x_{i,c}=0$, $x_{i,p}=0$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
Their mean weight gain is thus equal to $\mu_{h,b,bk}=\beta_0+\beta_h+\beta_{bk}$
- Rats that are assigned to block $k$ and receive a cereal based low protein diet have
a covariate pattern $x_{i,h}=0$, $x_{i,c}=1$, $x_{i,p}=0$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
Their mean weight gain is thus equal to $\mu_{h,c,bk}=\beta_0+\beta_c+\beta_{bk}$
- Rats that are assigned to block $k$ and receive a cereal based heigh protein diet have
a covariate pattern $x_{i,h}=1$, $x_{i,c}=1$, $x_{i,p}=0$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
Their mean weight gain is thus equal to $\mu_{h,c,bk}=\beta_0+\beta_h+\beta_c+\beta_{ch}+\beta_{bk}$
- Rats that are assigned to block $k$ and receive a pork based low protein diet have
a covariate pattern $x_{i,h}=0$, $x_{i,c}=0$, $x_{i,p}=1$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
Their mean weight gain is thus equal to $\mu_{l,p,bk}=\beta_0+\beta_p+\beta_{bk}$
- Rats that are assigned to block $k$ and receive a pork based heigh protein diet have
a covariate pattern $x_{i,h}=1$, $x_{i,c}=0$, $x_{i,p}=1$, $x_{i,bm}=0$ with $m\neq k$ and $x_{i,bk}=1$.
There mean weight gain is thus equal to $\mu_{h,p,bk}=\beta_0+\beta_h+\beta_p+\beta_{ph}+\beta_{bk}$
We can now relate this to the output of the `lm` function:
- The intercept $\beta_0$ is thus the average weight increase in the low beef
diet for rats in block 1.
- The parameter $\beta_c$: the average difference in weight gain between cereal-low and
beef-low diet is `r lm1$coef["protSourcecereal"]`g.
- The parameter $\beta_p$: the average difference in weight gain between pork-low and
beef-low diet is `r lm1$coef["protSourcepork"]`g.
- The parameter $\beta_h$: the average difference in weight gain between beef-high and
beef-Low diet is `r lm1$coef["protLevelhigh"]`g.
- The parameter $\beta_{ch}$ is the difference in the average weight gain difference
due to the high protein level as compared to the low protein level for cereal diets
as compared to the weight gain difference that occurs due to the protein level
in the reference class (here beef diet). Here this is negative,
i.e. `r lm1$coef["protSourcecereal:protLevelhigh"]`g, thus the weight gain
for the cereal protein source increases on average less between high and low protein diets
than in beef based diets.
- The parameter $\beta_{ph}$ is the difference in the average weight gain difference
due to the high protein level as compared to the low protein level for pork diets
as compared to the weight gain difference that occurs due to the protein level
in the reference class (here beef diet). Here this is negative,
i.e. `r lm1$coef["protSourcepork:protLevelhigh"]`g, thus the weight gain
for the proc protein source increases on average less between high and low protein diets
than in beef based diets.
## Testing the overall (combined) effect of diet
Because there are multiple factors with different levels in the model,
we can first assess the effect of the diet (protein Level, protein source
and the interaction) by using anova. With this test we will assess the null
hypothesis that the average weight gain in each treatment is equal: i.e. $H_0: \mu_{b,l}=\mu_{b,h}=\mu_{c,h}=\mu_{c,l}=\mu_{p,h}=\mu_{p,h}$ versus
the alternative hyptohesis $H_1:$ that at least two treatment means are
different.
```{r}
lm0 <- lm(weightGain ~ block, data = diet)
anova(lm0, lm1)
```
We can conclude that there is an extremely significant effect of the diet type
(protein source and/or protein level and/or protein source-protein level
interaction) on the weight gain of rats (p << 0.001).
## Assessing the interaction effect between protein source and protein level
```{r, message=FALSE, warning=FALSE}
library(car)
Anova(lm1, type = "III")
```
There is a very significant interaction between the protein source and
the protein level (p = `r format(Anova(lm1, type=3)$"Pr(>F)",digits=1)[5]`). This indicates that the average weight increase due to
the protein level differs according to the protein source.
Hence, we cannot assess the effect of the protein source and/or protein level
independently because there effects of the protein source vary according to
the protein level.
## Assessing specific contrasts
Imagine that we are interested in assessing if there is an effect of
1. protein source in the low protein diets
- $\mu_{c,l}-\mu_{b,l} = \beta_c$
- $\mu_{p,l}-\mu_{b,l} = \beta_p$
- $\mu_{c,l}-\mu_{p,l}= \beta_c-\beta_p$)
2. protein source in high protein diets
- $\mu_{c,h}-\mu_{b,h}=\beta_c+\beta_{ch}$
- $\mu_{p,h}-\mu_{b,h}=\beta_p+\beta_{ph}$
- $\mu_{p,h}-\mu_{c,h}=(\beta_c+\beta_{ch})-(\beta_p+\beta_{ph})$
3. protein level for
- beef diets ($\mu_{b,h}-\mu_{b,l}=\beta_h$),
- cereal diets ($\mu_{c,h}-\mu_{c,l}=\beta_h+\beta_{ch}$) and
- pork diets ($\mu_{p,h}-\mu_{p,l}=\beta_h+\beta_{ph}$).
4. If the effect of the protein level differs between
- beef and cereal $(\mu_{c,h}-\mu_{c,l}) - (\mu_{b,h}-\mu_{b,l})=\beta_{ch}$
- beef and pork $(\mu_{p,h}-\mu_{p,l}) - (\mu_{c,h}-\mu_{c,l})=\beta_{ph}$ and
- cereal and pork diets $(\mu_{c,h}-\mu_{c,l}) - (\mu_{p,h}-\mu_{p,l})=\beta_{ch}-\beta_{ph}$.
These effects of interest are so-called
**contrasts, i.e. linear combinations of the parameters**.
We can define the contrasts and assess the significance of the contrasts with
the code below. The contrasts are given as input in the form of symbolic
descriptions to the `linfct` argument of the `glht` function.
```{r,message=FALSE, warning=FALSE}
library(multcomp)
set.seed(75468) # to get reproducible results (small effect if removed)
lm1MultComp <- glht(
model = lm1,
linfct = c(
"protSourcecereal = 0",
"protSourcepork = 0",
"protSourcecereal- protSourcepork = 0",
"protSourcecereal + protSourcecereal:protLevelhigh = 0",
"protSourcepork + protSourcepork:protLevelhigh = 0",
"(protSourcecereal + protSourcecereal:protLevelhigh) - (protSourcepork + protSourcepork:protLevelhigh) = 0",
"protLevelhigh = 0",
"protLevelhigh + protSourcecereal:protLevelhigh = 0",
"protLevelhigh + protSourcepork:protLevelhigh = 0",
"protSourcecereal:protLevelhigh = 0",
"protSourcepork:protLevelhigh = 0",
"protSourcecereal:protLevelhigh - protSourcepork:protLevelhigh = 0"
)
)
```
```{r}
summary(lm1MultComp)
```
```{r}
confint(lm1MultComp)
```
Note that the p-values and the confidence intervals are automatically corrected
for multiple testing.
# Conclusion
- There is an extremely significant effect of the type of protein diet on the
weight gain of rats (p<<1e-6). The interaction between protein type and protein
source is also very significant (p=`r round(Anova(lm1,type=3)$Pr[5],4)`).
- The average weight gain does not vary significantly according to protein
source in the diets with low protein levels (all p>0.99).
- The weight gain in the cereal diet at high protein concentration is on
average `r confint(lm1MultComp)$confint["protSourcecereal + protSourcecereal:protLevelhigh",1] %>% abs %>% round(.,1)`g and `r confint(lm1MultComp)$confint["(protSourcecereal + protSourcecereal:protLevelhigh) - (protSourcepork + protSourcepork:protLevelhigh)",1] %>% abs %>% round(.,1)`g g lower than in the high protein beef and pork diet,
respectively (95% CI [`r confint(lm1MultComp)$confint["protSourcecereal + protSourcecereal:protLevelhigh",-1] %>% abs %>% round(.,1) %>% sort`] and [`r confint(lm1MultComp)$confint["(protSourcecereal + protSourcecereal:protLevelhigh) - (protSourcepork + protSourcepork:protLevelhigh)",-1] %>% abs %>% round(.,1) %>% sort`]) and the differences are
extremely significant (p<0.001). The average weight gains after meat based
diets at high protein levels, however, do not differ significantly
(p>0.99).
- We also discovered an extremely significant difference in weight gain
according to the protein level for beef and pork based diets (p<0.001).
The weight gain on average increases with `r round(confint(lm1MultComp)$confint["protLevelhigh",1],1)`g and `r round(confint(lm1MultComp)$confint["protLevelhigh + protSourcepork:protLevelhigh",1],1)`g in the high protein
level as compared to the low protein beef and pork diet, respectively
(95%CI [`r round(confint(lm1MultComp)$confint["protLevelhigh",-1],1)`] and [`r round(confint(lm1MultComp)$confint["protLevelhigh + protSourcepork:protLevelhigh",-1],1)`]). The protein level effect is not significant
for the cereal diet (p>0.99).
- Finally significant interactions between protein level and protein source were
found (p<0.05), i.e. the increase in weight gain due to protein level in cereal
based diets was 18.2g and 16.5g lower than that of beef and pork based diets,
respectively (95% CI [3g,33.4g] and [1.3g,31.7g]). The average difference in
increase in weight gain due to protein level among the meat based diets was not
significant (p>0.99).
All reported p-values and confidence intervals were corrected for multiple
testing