kruskal_wallis.qmd

# Kruskal-Wallis test {#sec-kruskal_wallis}

```{r}
#| include: false

library(fontawesome)
library(htmlwidgets)
```


The Kruskal-Wallis test is a rank-based non-parametric alternative to the one-way ANOVA and an extension of the Wilcoxon-Mann-Whitney test to allow the comparison of more than two independent groups. It's usually recommended when the assumptions of one-way ANOVA test are not met (non-normal distributions) or with small samples.

When we have finished this Chapter, we should be able to:

::: {.callout-caution icon="false"}
## Learning objectives

-   Applying hypothesis testing
-   Compare more than two independent samples applying Kruskal-Wallis test
-   Perform post-hoc tests
-   Interpret the results
:::

## Research question and Hypothesis Testing

We consider the data in *dataVO2* dataset. We wish to compare the VO2max in three different sports (runners, rowers, and triathletes).

::: {.callout-note icon="false"}
## Null hypothesis and alternative hypothesis

-   $H_0$: the distribution of VO2max is the same in all groups (the medians of VO2max in the three sports are the same)
-   $H_1$: there is at least one group with VO2max distribution different from the others (there is at least one sport with median VO2max different from the others)
:::

**NOTE:** The Kruskal-Wallis test should be regarded as a test of dominance between distributions comparing the mean ranks. The null hypothesis is that the observations from one group do not tend to have a higher or lower ranking than observations from the other groups. This test **does not** test the medians of the data as is commonly thought, it tests the **whole distribution**. However, if the distributions of the two groups have **similar shapes**, the Kruskal-Wallis test can be used to determine whether there are differences in the medians in the two groups. In practice, we use the medians to present the results.

## Packages we need

We need to load the following packages:

```{r}
#| message: false
#| warning: false

library(rstatix)
library(PupillometryR)
library(gtsummary)
library(here)
library(tidyverse)
```

## Preparing the data

We import the data *dataVO2* in R:

```{r}
#| warning: false

library(readxl)
dataVO2 <- read_excel(here("data", "dataVO2.xlsx"))

```

```{r}
#| echo: false
#| label: fig-dataVO2
#| fig-cap: Table with data from "dataVO2" file.

DT::datatable(
  dataVO2, extensions = 'Buttons', options = list(
    dom = 'tip',
    columnDefs = list(list(className = 'dt-center', targets = "_all"))
  )
)

```

We inspect the data and the type of variables:

```{r}
glimpse(dataVO2)
```

The dataset *dataVO2* has 30 participants and two variables. The numeric `VO2max` variable and the `sport` variable (with levels "roweres", "runners", and "triathletes") which should be converted to a factor variable using the `factor()` function as follows:

```{r}
dataVO2 <- dataVO2 %>% 
  mutate(sport = factor(sport))

glimpse(dataVO2)
```

## Explore the characteristics of distribution for each group and check for normality

The distributions can be explored visually with appropriate plots. Additionally, summary statistics and significance tests to check for normality (e.g., Shapiro-Wilk test) can be used.

**Graph**

We can visualize the distribution of `VO2max` for the three sport groups:

```{r}
#| warning: false
#| label: fig-violin_plot5
#| fig-cap: Rain cloud plot.

set.seed(123)
ggplot(dataVO2, aes(x=sport, y=VO2max)) + 
  geom_flat_violin(aes(fill = sport), scale = "count") +
  geom_boxplot(width = 0.14, outlier.shape = NA, alpha = 0.5) +
  geom_point(position = position_jitter(width = 0.05), 
             size = 1.2, alpha = 0.6) +
  ggsci::scale_fill_jco() +
  theme_classic(base_size = 14) +
  theme(legend.position="none", 
        axis.text = element_text(size = 14))
```

The above figure shows that the data in triathletes group have some outliers. Additionally, we can observe that the runners group seems to have the largest VO2max.

**Summary statistics**

The `VO2max` summary statistics for each sport group are:

::: {#exercise-joins .callout-tip}
## Summary statistics by group

::: panel-tabset
## dplyr

```{r}
VO2_summary <- dataVO2 %>%
  group_by(sport) %>%
  dplyr::summarise(
    n = n(),
    na = sum(is.na(VO2max)),
    min = min(VO2max, na.rm = TRUE),
    q1 = quantile(VO2max, 0.25, na.rm = TRUE),
    median = quantile(VO2max, 0.5, na.rm = TRUE),
    q3 = quantile(VO2max, 0.75, na.rm = TRUE),
    max = max(VO2max, na.rm = TRUE),
    mean = mean(VO2max, na.rm = TRUE),
    sd = sd(VO2max, na.rm = TRUE),
    skewness = EnvStats::skewness(VO2max, na.rm = TRUE),
    kurtosis= EnvStats::kurtosis(VO2max, na.rm = TRUE)
  ) %>%
  ungroup()

VO2_summary
```

## dlookr

```{r}
dataVO2 %>% 
  group_by(sport) %>% 
  dlookr::describe(VO2max) %>% 
  select(described_variables,  sport, n, na, mean, sd, p25, p50, p75, skewness, kurtosis) %>% 
  ungroup()
```
:::
:::

The sample size is relative small (10 observations in each group). Moreover, the skewness (1.5) and the (excess) kurtosis (1.6) for the triathletes fall outside of the acceptable range of \[-1, 1\] indicating right-skewed and leptokurtic distribution.

**Normality test**

The *Shapiro-Wilk* test for normality for each sport group is:

```{r}
dataVO2 %>%
  group_by(sport) %>%
  shapiro_test(VO2max) %>% 
  ungroup()
```

We can see that the data for the triathletes is not normally distributed (p=0.023 \<0.05) according to the Shapiro-Wilk test.

By considering all of the information together (small samples, graphs, normality test) the overall decision is against of normality.

## Run the Kruskal-Wallis test

Now, we will perform a Kruskal-Wallis test to compare the VO2max in three sports.

::: callout-tip
## Kruskal-Wallis test

::: panel-tabset
## Base R

```{r}
kruskal.test(VO2max ~ sport, data = dataVO2)
```

## rstatix

```{r}
dataVO2 %>% 
  kruskal_test(VO2max ~ sport)
```
:::
:::

The p-value (\<0.001) is lower than 0.05. There is at least one sport in which the VO2max is different from the others.

 
**Present the results in a summary table**

A summary table can also be presented:

```{r}
#| code-fold: true
#| code-summary: "Show the code"

gt_sum10 <- dataVO2 %>% 
  tbl_summary(
    by = sport, 
    statistic = VO2max ~ "{median} ({p25}, {p75})", 
    digits = list(everything() ~ 1),
    label = list(VO2max ~ "VO2max (mL/kg/min)"), 
    missing = c("no")) %>% 
  add_p(test = VO2max ~ "kruskal.test", purrr::partial(style_pvalue, digits = 2)) %>%
  as_gt() 

gt_sum10
```

## Post-hoc tests

A significant WMW is generally followed up by post-hoc tests to perform multiple pairwise comparisons between groups:

::: callout-tip
## Post-hoc tests

::: panel-tabset
## Dunn's approach

```{r}
# Pairwise comparisons
pwc_Dunn <- dataVO2 %>% 
  dunn_test(VO2max ~ sport, p.adjust.method = "bonferroni")

pwc_Dunn 
```

## WMW with Bonferroni

Alternatively, we can perform pairwise comparisons using pairwise WMW's test and calculate the adjusted p-values using Bonferroni correction:

```{r}
# Pairwise comparisons

pwc_BW <- dataVO2 %>% 
  pairwise_wilcox_test(VO2max ~ sport, p.adjust.method = "bonferroni")

pwc_BW
```
:::
:::

Dunn's pairwise comparisons were carried out using the method of Bonferroni and adjusting the p-values were calculated.

The runners' VO2max (median= 77.2, IQR=\[74.2, 79.5\] mL/kg/min) seems to differ significantly (larger based on the medians) from rowers (69.6 \[67.8, 73.1\] mL/kg/min, p=0.046 \<0.05) and triathletes (65.4 \[64.0, 67.6\] mL/kg/min, p \<0.001).