-
Notifications
You must be signed in to change notification settings - Fork 0
/
wilcoxon_test.qmd
237 lines (169 loc) · 6.8 KB
/
wilcoxon_test.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# Wilcoxon Signed-Rank test {#sec-wilcoxon_test}
```{r}
#| include: false
library(fontawesome)
library(htmlwidgets)
```
Wilcoxon Signed-Rank test (Wilcoxon test) is a non-parametric test that can be conducted to compare paired samples when the differences are not normally distributed. It is based on the signs of the differences and the magnitude of the rank of the differences between pairs of measurements, rather than the actual values. The null hypothesis is that there is no tendency for values under each paired variable to be higher or lower. It is often thought of as a test for small samples. However, if the sample is smaller than 6, then statistical significance is impossible.
When we have finished this Chapter, we should be able to:
::: {.callout-caution icon="false"}
## Learning objectives
- Applying hypothesis testing
- Compare two dependent (related) samples applying Wilcoxon Signed-Rank test
- Interpret the results
:::
## Research question
The dataset *eyes* contains thickness of the cornea (in microns) in patients with one eye affected by glaucoma; the other eye is unaffected. We investigate if there is evidence for difference in corneal thickness in affected and unaffected eyes.
::: {.callout-note icon="false"}
## Null hypothesis and alternative hypothesis
- $H_0$: The distribution of the differences in thickness of the cornea is symmetrical about zero
- $H_1$: The distribution of the differences in thickness of the cornea is not symmetrical about zero
:::
**NOTE:** If we are testing the null hypothesis that the **median** of the paired rank differences is zero, then the paired rank differences must all come from a symmetrical distribution. Note that we do not have to assume that the distributions of the original populations are symmetrical.
## Packages we need
We need to load the following packages:
```{r}
#| message: false
#| warning: false
library(rstatix)
library(PupillometryR)
library(gtsummary)
library(here)
library(tidyverse)
```
## Preparing the data
We import the data *eyes* in R:
```{r}
#| warning: false
library(readxl)
eyes <- read_excel(here("data", "eyes.xlsx"))
```
```{r}
#| echo: false
#| label: fig-weight
#| fig-cap: Table with data from "eyes" file.
DT::datatable(
eyes, extensions = 'Buttons', options = list(
dom = 'tip',
columnDefs = list(list(className = 'dt-center', targets = "_all"))
)
)
```
We calculate the differences using the function `mutate()`:
```{r}
eyes <- eyes %>%
mutate(dif_thickness = affected_eye - unaffected_eye)
```
We inspect the data:
```{r}
glimpse(eyes)
```
## Explore the characteristics of distribution of differences
The distributions of differences can be explored with appropriate plots and summary statistics.
**Graph**
We can explore the data visually for symmetry with a density plot.
```{r}
#| warning: false
#| label: fig-hist2
#| fig-cap: Density plot of the thickness differences.
eyes %>%
ggplot(aes(x = dif_thickness)) +
geom_density(fill = "#76B7B2", color="black", alpha = 0.2) +
geom_vline(aes(xintercept=mean(dif_thickness)),
color="blue", linetype="dashed", size=1.2) +
geom_vline(aes(xintercept=median(dif_thickness)),
color="red", linetype="dashed", size=1.2) +
labs(x = "Differences of thickness (micron)") +
theme_minimal() +
theme(plot.title.position = "plot")
```
**Summary statistics**
Summary statistics can also be calculated for the variables.
::: {#exercise-joins .callout-tip}
## Summary statistics
::: panel-tabset
## dplyr
We can utilize the `across()` function to obtain the results across the three variables simultaneously:
```{r}
#| warning: false
summary_eyes <- eyes %>%
dplyr::summarise(across(
.cols = c(dif_thickness, affected_eye, unaffected_eye),
.fns = list(
n = ~n(),
na = ~sum(is.na(.)),
min = ~min(., na.rm = TRUE),
q1 = ~quantile(., 0.25, na.rm = TRUE),
median = ~quantile(., 0.5, na.rm = TRUE),
q3 = ~quantile(., 0.75, na.rm = TRUE),
max = ~max(., na.rm = TRUE),
mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE),
skewness = ~EnvStats::skewness(., na.rm = TRUE),
kurtosis= ~EnvStats::kurtosis(., na.rm = TRUE)
),
.names = "{col}_{fn}")
)
# present the results
summary_eyes <- summary_eyes %>%
mutate(across(everything(), \(x) round(x, 2))) %>% # round to 3 decimal places
pivot_longer(1:33, names_to = "Stats", values_to = "Values") # long format
summary_eyes
```
## dlookr
```{r}
eyes %>%
dlookr::describe(dif_thickness, affected_eye, unaffected_eye) %>%
select(described_variables, n, na, mean, sd, p25, p50, p75, skewness, kurtosis) %>%
ungroup()
```
:::
:::
The differences seems to come from a population with a symmetrical distribution and the skewness is close to zero (0.03). However, the (excess) kurtosis equals to -1.37 (platykurtic) and the sample size is small. Therefore, the data may not follow the normal distribution.
**Normality test**
We can use Shapiro-Wilk test to check for normality of the differences.
```{r}
eyes %>%
shapiro_test(dif_thickness)
```
The Shapiro-Wilk test suggests that the weight differences are normally distributed (p=0.65 \> 0.05). However, here, normality test is not helpful because of the small sample (the test is under-powered).
## Run the Wilcoxon Signed-Rank test
The differences between the two measurements can be tested using a rank test such as Wilcoxon Signed-Rank test.
::: callout-tip
## Wilcoxon Signed-Rank test
::: panel-tabset
## Base R (1st way)
```{r}
#| warning: false
wilcox.test(eyes$dif_thickness, conf.int = T)
```
## Base R (2nd way)
```{r}
#| warning: false
wilcox.test(eyes$affected_eye, eyes$unaffected_eye, conf.int = T, paired = TRUE)
```
## rstatix
```{r}
eyes %>%
wilcox_test(dif_thickness ~ 1)
```
:::
:::
The result is not significant (p = 0.31 \> 0.05). However, we can't be certain that there is not difference in corneal thickness in affected and unaffected eyes because the sample size is very small.
## Present the results in a summary table
```{r}
#| code-fold: true
#| code-summary: "Show the code"
#| warning: false
tb2 <- eyes %>%
mutate(id = row_number()) %>%
select(-dif_thickness) %>%
pivot_longer(!id, names_to = "groups", values_to = "thickness")
tb2 %>%
tbl_summary(by = groups, include = -id,
label = list(thickness ~ "thickness (microns)"),
digits = list(everything() ~ 1)) %>%
add_p(test = thickness ~ "paired.wilcox.test", group = id,
estimate_fun = thickness ~ function(x) style_sigfig(x, digits = 3))
```
There is not evidence from this small study with patients of glaucoma that the thickness of the cornea in affected eyes, median = 459 $\mu{m}$ (IQR: 436.5, 478.5), differs from unaffected eyes 470 $\mu{m}$ (442, 479.5). The result (pseudomedian = -6, 95% CI: -16 to 8) is not significant (p=0.30 \>0.05).