-
Notifications
You must be signed in to change notification settings - Fork 0
/
unit_testing.Rmd
814 lines (680 loc) · 51.9 KB
/
unit_testing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
---
title: |
| Responsible modelling: Unit testing for infectious disease epidemiology.
author:
- name: Tim CD Lucas
email: timcdlucas@gmail.com
affiliation: [Big Data Institute, Imperial]
corresponding: timcdlucas@gmail.com
- name: Timothy M Pollington
affiliation: [Big Data Institute, MathSys CDT]
- name: Emma L Davis
affiliation: Big Data Institute
- name: T Déirdre Hollingsworth
affiliation: Big Data Institute
address:
- code: Big Data Institute
address: Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
- code: Imperial
address: Centre for Environment and Health, School of Public Health, Imperial College, UK
- code: MathSys CDT
address: MathSys CDT, University of Warwick, UK
abstract: |
Infectious disease epidemiology is increasingly reliant on large-scale computation and inference.
Models have guided health policy for epidemics including COVID-19 and Ebola and endemic diseases including malaria and tuberculosis.
Yet a coding bug may bias results, yielding incorrect conclusions and actions causing avoidable harm.
We are ethically obliged to make our code as free of error as possible.
Unit testing is a coding method to avoid such bugs, but it is rarely used in epidemiology.
We demonstrate how unit testing can handle the particular quirks of infectious disease models and aim to increase uptake of this methodology in our field.
keywords: unit testing, software development, reproducible science, computational models
bibliography: unit_testing.bib
output:
bookdown::pdf_book:
base_format: rticles::elsevier_article
includes:
in_header: preamble.tex
number_sections: true
---
```{r, setup, echo=FALSE, results = 'hide', message = FALSE}
rm(list = ls())
library(knitr)
oldSource <- knit_hooks$get("source")
knit_hooks$set(source = function(x, options) {
x <- oldSource(x, options)
x <- ifelse(!is.null(options$ref), paste0("\\label{", options$ref, "}", x), x)
ifelse(!is.null(options$codecap), paste0("\\captionof{chunk}{", options$codecap,"}", x), x)
})
library(ggplot2)
library(tidyr)
library(testthat)
knitr::opts_chunk$set(cache = FALSE, fig.width = 7, fig.height = 4,
out.extra = '', out.width = "0.6\\textwidth",
fig.align='center', fig.pos = "h")
set.seed(01011885)
# You might need to install the package rticles for the formatting to work.
# To build run
# rmarkdown::render('unit_testing.Rmd')
```
\newpage
# Introduction
\linenumbers
Modelling is an important tool for understanding fundamental biological processes in infectious disease dynamics, evaluating potential intervention efficacy and forecasting disease burden.
At the time of writing, infectious disease modellers are playing a central role in the interpretation of available data on the COVID-19 pandemic to inform policy design and evaluation [@ihme; @imperial; @hellewell2020feasibility].
Similarly, policy on endemic infectious diseases, such as duration and frequency of control programmes and spatial prioritisation, is also directed by models [@behrend2020modelling].
Such research builds on a long history of modelling for policy [@heesterbeek2015modeling] and a general understanding of the dynamics of infectious disease systems.
<!--
I have a list of high impact bugs, but in summary a few highlights:
- Omission of a minus sign in 1995 cost Fidelity Magellan Fund £1.6million
- In 2013 a PhD student found basic mistakes in a globally influential economics study that was used by many countries to inform austerity
- London 2012 Olympics over-sold 10,000 synchronised swimming tickets
- 2010: MI5 incorrectly bugged over 100 phones due to a code formatting error
- Mars Climate orbiter lost in space due to failure to convert between imperial and metric units (1998)
- Mariner 1 space craft lost in 1962 due to omission of a hyphen
- Kerberas security system trivial to break for 8 years due to failure to correctly seed random number generator
--https://retractionwatch.com/2018/11/05/data-mishap-forces-retraction-of-paper-on-gun-laws-domestic-killings/#more-76505
https://retractionwatch.com/2020/05/05/doing-the-right-thing-researchers-retract-clinician-burnout-study-after-realizing-their-error/
https://retractionwatch.com/2016/09/26/error-in-one-line-of-code-sinks-2016-seer-cancer-study/
--->
<!--
# https://twitter.com/ID_AA_Carmack/status/1254872368763277313
# https://github.com/mrc-ide/covid-sim/issues/166 Model did not have unit tests. At time of 60da1ecff99001758499292b4751e4fdbb92cfd2
# Does have a basic regression test but this was probably added recently.
--->
Given the importance of modelling results, it is vital that the code they rely on is both coded correctly and trusted.
Bugs can be caused by typos, code behaving in unexpected ways, or logical flaws in the construction of the code.
Outside of epidemiology, bugs have been found in code that had been used by many researchers [@bhandari2019characterization] and may lead to retractions [@retract].
Bugs have also been found in highly influential work; a paper that informed austerity policies globally was found to have a crucial computational mistake [@herndon2014does].
In engineering bugs caused the Mars Climate orbiter and the Mariner 1 spacecraft to become lost or destroyed [@nasa; @board1999mars].
We do not know of high profile cases of infectious disease models being found to have bugs once published, but as code is not always shared and little post-publication testing of code occurs, this likely represents a failure of detection.
The issue of trust was highlighted recently when Neil Ferguson, one of the leading modellers informing UK COVID-19 government policy, tweeted:
> "I'm conscious that lots of people would like to see and run the pandemic simulation code we are using to model control measures against COVID-19. To explain the background - I wrote the code (thousands of lines of undocumented C) 13+ years ago to model flu pandemics..." [@ferg_tweet].
The code that was released did not include any tests [@ferg_github] but subsequent work has added documentation, while independent code reviews have supported the results of the study [@ferg_check; @it].
The tweet and lack of tests garnered considerable backlash (some of which may have been politically motivated [@ferg_pol]), with observers from the software industry noting that code should be both documented and tested to ensure its correctness [@it].
It is understandable that during the fast-moving, early stages of a pandemic, other priorities were put above testing and documenting the code.
It is also important to note that a lack of tests is not unusual in our field, or for some of the authors of this article.
To guard against error, policy-makers now standardly request analyses from multiple modelling groups (as is the case in the UK for COVID-19 [@spim]) as a means to provide scientific robustness (both in terms of model uncertainty and in terms of implementation) [@den2019guidelines; @hollingsworth2017learning], yet this is not enough if the models themselves lack internal validity.
Infectious disease modellers are rarely trained as professional programmers [@it] and recently some observers have made the case that this has been due to a lack of funding [@funds].
Epidemiological groups such as RECON [@RECON], and broader groups such as rOpenSci, have however started providing support for scientist to develop better coding practices.
The communities built around these groups are an invaluable resource for new programmers.
It is also notable that while a number of articles have stated that unit tests should be written [@osborne2014ten; @wilson2014best; @RECON] there are few texts available which demonstrate the use of unit testing to check infectious disease models.
While the basic premise of unit testing is simple, there is an art to knowing what aspects of code can and should be tested.
Guides that enable researchers to acquire this skill quickly will benefit the field.
Whilst there are many drivers and attempts to address the problem with code robustness, today's models are increasingly moving from mean-field ordinary differential equation approximations to individual-based models with complex, data-driven contact processes [@willem2017lessons; @ferguson2006strategies].
These increases in model complexity are accompanied with growing codebases.
Furthermore, while there are some general packages for epidemiological modelling [@jenness2018epimodel; @santos2015epidynamics], it is very common for epidemiologists to study a new model and to therefore code it from scratch.
Unlike, established packages that have had time to mature and fix many bugs, newly programmed models are more prone to errors.
As the mathematical methods depend increasingly on numerical solutions rather than analytical pen-and-paper methods, it becomes more difficult to tell if a bug is present based on model outputs alone.
Furthermore, checking models in an _ad hoc_ way is biased as unexpected results trigger careful checks of the code while models that show expected behaviour are more likely to be assumed bug-free.
_Unit testing_ is a formally-defined, principled framework that compares outputs from code to what the programmer expected to happen (@wickham2015r Chapter 7, @osborne2014ten, @wilson2014best).
Ready-to-run frameworks for unit testing are available in _R_ [@R], _Julia_ [@bezanson2017julia] and _Python_ [@python] and are standard practice in the software industry.
These testing concepts also apply to many other scientific fields but here we focus on infectious diseases.
Infectious disease modelling presents specific challenges, such as stochastic outputs [@vsevvcikova2006automated; @guderlei2007statistical; @patrick2017toolkit], which are difficult to test and not covered in general unit testing literature.
There are a number of other programming techniques that should be used in conjunction with unit testing, such as defensive programming, version control, pair-programming and comprehensive documentation [@osborne2014ten; @wilson2014best; @wickham2019advanced; @wickham2015r; @RECON] and these are important complements to the methods in this paper.
In this primer we introduce unit testing with a demonstration of an infectious disease model.
While we use _R_ throughout to exemplify the unit testing framework, the concepts apply equally well to the various languages commonly used by modellers such as _Julia_ and _python_; we therefore briefly direct the reader towards available testing frameworks for those languages in Section \@ref(frameworks).
# Unit testing foundations
At the heart of every _unit test_ is a function output, its known or expected value and a process to compare the two.
For the square root function ($\sqrt{x}$ or `sqrt(x)` in _R_), we could write a test that runs the function for the number 4, i.e. `sqrt(x = 4)`, and compares it to the correct answer i.e. 2.
However, often function arguments will cover an infinite range of possibilities and we cannot exhaustively check them all.
Instead we devise tests that cover standard usage as well as _special case_ scenarios: what do we want our function to do if given a negative number e.g. `sqrt(-1)`, or a vector argument containing strings or missing values e.g. `sqrt(c(4, "melon", NA))`?
Strictly-defined, unit testing tests code with no dependencies outside of the test definition.
This is in contrast to integration testing that tests how these small units integrate with other units of code, including dependencies.
Testing at even higher levels includes system testing (which tests how multiple systems such as software and hardware interact) and acceptance testing (in which end-users, or software commissioners, test that it meets requirements).
Within the scientific community however, the term unit testing is typically used in a slightly vague way and implies a combination of integration and (strict) unit testing.
As so much scientific software relies on various dependencies, even at very low levels, the strict definition of unit testing is not necessarily useful.
Here we continue to use this vague definition, simply focussing on testing of code at a low level.
The first benefit of this is that it allows you to work out the exact expected result of a function call.
Second, if you do find bugs, they are easier to isolate and fix if you are working at these low levels.
Third, code that either calls the low-level functions or relies on outputs from them is easier to test and debug.
In _R_, the \texttt{testthat} package [@testthat], provides a simple interface for testing.
While a variety of test functions can make different comparisons, the two main ones are `expect_true()` and `expect_equal()`.
`expect_true()` takes one argument: an expression that should evaluate to `TRUE`.
For the square root example above, we would write `expect_true(sqrt(4) == 2)`.
`expect_equal()` takes two arguments, an expression and the expected output;
so we would write `expect_equal(sqrt(4), 2)`.
There are a number of ways to incorporate unit testing into your programming workflow.
1. Each time you write code for a new, discrete chunk of functionality, you should write tests that confirm it does what you expect. These tests should be kept with the code it is testing (in the same directory or in a subdirectory).
2. Whenever a bug is found outside of the existing testing framework, a new test should be written to capture it.
Then if the bug re-emerges it will hopefully be quickly flagged so that the developer can fix it.
3. All of these tests should be run regularly as you develop new code.
If a change causes the previous tests to break, this indicates the introduction of an error in the new code, or implies that the older code could not generalise to the adapted environment.
# An example multi-pathogen re-infection model
```{r modelexample, echo=FALSE, out.width="100%", fig.cap="The 3-pathogen system with arrows showing the possible transitions at every time step.", out.width = 150}
include_graphics("figures/modelexample.png")
```
Here we define a toy epidemiological model and then demonstrate how to effectively write unit tests for it in _R_ code.
Consider a multi-pathogen system, with a population of $N$ infected individuals who each get infected by a new pathogen at every time step (Fig. \ref{fig:modelexample}).
In this toy example, we imagine that individuals are infected with exactly one pathogen at a time.
Some aspects of this model could reflect the dynamics of a population where specific antibiotics are used regularly i.e. each time step an individual is infected, diagnosed and treated suboptimally, leaving the individual susceptible to infection from any pathogen, including the one they were just treated for.
The aim of this model however is not to be realistic but to serve as a learning tool with succinct code.
We work through a more realistic model in the Supplementary Material.
Each individual $i$, at time $t$, is defined by the pathogen they are currently infected with $I_{it} \in \{a, b, c\}$ for a 3-pathogen system.
The population is therefore defined by a length $N$ state vector $\mathbf{I}_t = (I_{it})_{i=[1,N]}$.
At each time step, every individual's infection status is updated as:
$$I_{it} = \text{Unif}(\mathbf{I}_{t-1}).$$
That is, at each iteration, the new infection status of each individual is a Uniform random sample from the set of infection statuses in the previous iteration (including itself $I_{i,t-1}$).
This model has a total of three parameters, the total number of individuals, the number of pathogen species, and the number of time steps, all of which are scalar, positive integers.
Certainly this toy model is naïve as it is governed by mass-action principles, ignoring contact and spatial dynamics.
Nevertheless it will serve its purpose.
Before writing any code we can consider the model's expected behaviour.
Firstly, we would expect an individual to be repeatedly infected with different strains.
Secondly, we would expect the proportions of the different pathogens to stochastically drift, until all but one pathogen goes extinct.
\@ref(firstcode) shows our first attempt at implementing this model.
\newline
```{r first_code, ref = "firstcode", codecap = "Base example of the multi-pathogen re-infection model.", results='hide'}
N <- 12 # infected individuals
n_steps <- 20 # study length
# create the matrix to store the simulation data
I <- matrix(data = NA, nrow = n_steps, ncol = N)
# Initialise the population at t=1 with repeating configuration
I[1, ] <- rep(x = c("a", "b", "c"), length.out = N)
# At each time step, everyone is re-infected
# by someone from the previous time step.
for(t in seq(2, n_steps)){
I[t, ] <- sample(x = I[t-1, ], size = N)
}
```
```{r firstplots, echo = FALSE, fig.cap = 'Infection profile for individual 1, who is initially infected with pathogen $a$ but then reinfected with different pathogens', out.width = "75%"}
library(ggplot2)
d1 <- data.frame(time = seq(n_steps), pathogen = (I[, 1]), stringsAsFactors = TRUE)
ggplot(d1, aes(time, as.numeric(pathogen))) +
geom_path() +
geom_point() +
scale_y_continuous(breaks = seq(3), labels = c('a', 'b', 'c')) +
labs(y = 'Pathogen', x = 'Time') +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
text = element_text(size = 20, family = "serif"))
```
```{r, incorrectplot, echo = FALSE, fig.cap =' The proportion of each pathogen through time as given by Code 1. Each pathogen is a different line but are overplotted. The proportions of each pathogen do not stochastically drift as we would expect.', out.width = "75%"}
(apply(I, 1, function(x) table(factor(x, levels = c("a", "b", "c")))) / N) %>%
t %>%
data.frame %>%
cbind(Time = seq_len(nrow(I))) %>%
pivot_longer(cols = -Time, names_to = 'Pathogen', values_to = 'Proportion') %>%
ggplot(aes(x = Time, y = Proportion, colour = Pathogen, linetype = Pathogen)) +
ylim(0, 1) +
geom_line(size = 2) +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
legend.key.width = unit(3,"cm"),
text = element_text(size = 20, family = "serif"))
```
Usually we would make some output plots to explore if our code is performing sensibly.
Plotting the time course of which pathogen is infecting one individual shows repeated infection with different pathogens as expected (Fig. \@ref(fig:firstplots)).
However, if we plot the proportion of each pathogen (Fig. \@ref(fig:incorrectplot)) we quickly see that instead of stochastically varying, the proportions are identical through time and so there must be a bug present.
This simple example demonstrates the firstly that bugs can be subtle.
Secondly, it is not easy to notice an error, even in just 7 lines of code.
Thirdly, it is much easier to debug code when you know there is a bug.
Fourthly, while plotting simulation runs is an excellent way to check model behaviour, if we had only relied on Fig. \@ref(fig:firstplots) we would have missed the bug.
Additionally, manually checking plots is a time consuming and non-scalable method because a human has to perform this scan every test run.
In summary this *ad hoc* plotting approach reduces the chances that we will catch all bugs.
The cause of the bug is that `sample()` defaults to sampling without replacement `sample(..., replace = FALSE)`; this means everyone transmits their infection pathogen on a one-to-one basis rather than one-to-many as required by the model.
Setting `replace = TRUE` fixes this (\@ref(correctcode)) and when we plot the proportion of each pathogen (Fig. \@ref(fig:correctplots)) we see the correct behaviour (a single pathogen drifting to dominance).
At this point there are no further bugs in the code.
In the subsequent sections we will develop this base example as we consider different concepts in unit testing, resulting in well-tested code by the end.
Despite there being no further bugs, we can examine how unit testing allows us to protect against misuse of the code and reinforce our confidence in the code.
\newline
```{r, correctcode, ref = "correctcode", codecap = "Corrected base example.", results = 'hide', echo = TRUE}
N <- 12
n_steps <- 20
I <- matrix(data = NA, nrow = n_steps, ncol = N)
I[1, ] <- rep(x = c("a", "b", "c"), length.out = N)
for(t in seq(2, n_steps)){
I[t, ] <- sample(x = I[t-1, ], size = N, replace = TRUE)
}
```
```{r, correctplots, echo = FALSE, fig.cap = 'The correct behaviour with the proportion of each pathogen, as a different line, drifting to dominance or extinction.', out.width = "75%"}
(apply(I, 1, function(x) table(factor(x, levels = c("a", "b", "c")))) / N) %>%
t %>%
data.frame %>%
cbind(Time = seq_len(nrow(I))) %>%
pivot_longer(cols = -Time, names_to = 'Pathogen', values_to = 'Proportion') %>%
ggplot(aes(x = Time, y = Proportion, colour = Pathogen, linetype = Pathogen)) +
geom_line(size = 2) +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
legend.key.width = unit(3,"cm"),
text = element_text(size = 20, family = "serif"))
```
# Basic unit testing
## Write small functions {-#compactfuns}
To ensure the unit tests are evaluating the exact code as run in the analysis, code should be structured in functions, which can be used both to run unit tests and to generate results as part of a larger model codebase.
Make your functions compact with a single clearly-defined task.
We have defined a function, `initialisePop()`, to initialise the population and another, `updatePop()`, to run one iteration of the simulation (\@ref(compactfunctions)).
Organising the codebase into these bite-sized operations makes following the programming flow easier as well as understanding the code structure.
Furthermore, it facilitates other good programming practices such as defensive programming and documentation --- defensive programming, such as checking the class and dimensions of inputs in the first few lines of a function, ensures that the code will fail quickly and informatively if incorrect inputs are used.
At this stage we have also enabled the varying of the number of pathogens using the `pathogens` argument in the `initialisePop()` function.
The first iteration of the simulation, `I[1, ]`, is initialised with a repeating sequence of letters.
\newline
```{r, compactfunctions, ref = "compactfunctions", codecap = "Organising code into small functions.", result = 'hide'}
initialisePop <- function(n_steps, N, pathogens){
I <- matrix(data = NA, nrow = n_steps, ncol = N)
I[1, ] <- rep(x = letters[1:pathogens], length.out = N)
return(I)
}
updatePop <- function(x, t, N){
x[t, ] <- sample(x = x[t-1, ], size = N, replace = TRUE)
return(x)
}
```
## Test simple cases first {-#easycases}
If we start with a small population with few pathogens, we can then easily work out exactly what the initialised population should look like (\@ref(testsimplefirst)).
When we initialise a population with three individuals and three pathogens, we will get the sequence \texttt{"a", "b", "c"} as seen in the first test.
When the number of individuals is greater than the number of pathogens, the sequence will be repeated as seen in the second test.
Finally, when the number of individuals is greater than, but not a multiple of, the number of pathogens, the sequence will have an incomplete repeat at the end as seen in \@ref(testsimplefirst).
In this sequence of tests, we have taken our understanding of the function logic, and used it to make predictions of what the results should be.
We then test that the result is as expected and if everything is correct the code will return no output.
\newline
```{r, test_simple_first, ref = "testsimplefirst", codecap = "Using simple parameter sets we can work out beforehand what results to expect.", results = 'hide'}
pop1 <- initialisePop(n_steps = 2, N = 3, pathogens = 3)
expect_equal(pop1[1, ], c("a", "b", "c"))
pop2 <- initialisePop(n_steps = 2, N = 6, pathogens = 3)
expect_equal(pop2[1, ], c("a", "b", "c", "a", "b", "c"))
pop3 <- initialisePop(n_steps = 2, N = 5, pathogens = 3)
expect_equal(pop3[1, ], c("a", "b", "c", "a", "b"))
```
In contrast, if we had defined the `initialisePop()` function incorrectly, the test would fail and return an error.
\newline
```{r, test_error, ref = "testerror", codecap = "If our code is incorrect, the test will return an error.", eval = FALSE}
> # A broken function that does not add the pathogens.
> initialisePopBroken <- function(n_steps, N, pathogens){
+ I <- matrix(data = NA, nrow = n_steps, ncol = N)
+ return(I)
+ }
> popBroken <- initialisePopBroken(n_steps = 2, N = 3,
pathogens = 3)
> expect_equal(popBroken[1, ], c("a", "b", "c"))
Error: popBroken[1, ] not equal to c("a", "b", "c").
```
## Test all arguments {-#testargs}
`initialisePop()` has three arguments to check.
First we initialise the population, and then alter each argument in turn (\@ref(testallargs)).
Arguments `n_steps` and `N` directly change the expected dimension of the returned matrix so we check that the output of the function is the expected size.
For the `pathogens` argument we test that the number of pathogens is equal to the number requested.
\newline
```{r, test_all_args, ref = "testallargs", codecap = "Test all function arguments in turn.", results = 'hide'}
pop1 <- initialisePop(n_steps = 2, N = 3, pathogens = 3)
expect_equal(dim(pop1), c(2, 3))
pop2 <- initialisePop(n_steps = 6, N = 3, pathogens = 3)
expect_equal(dim(pop2), c(6, 3))
pop3 <- initialisePop(n_steps = 2, N = 20, pathogens = 3)
expect_equal(dim(pop3), c(2, 20))
pop4 <- initialisePop(n_steps = 2, N = 10, pathogens = 5)
expect_equal(length(unique(pop4[1, ])), 5)
```
## Does the function logic meet your expectations? {-#complexcases}
We can also cover cases that expose deviations from the logical structure of the system.
After initialising the population, we expect all the rows other than the first to contain `NA`.
We also expect each of the pathogens $a$, $b$ and $c$ to occur at least once on the first row if `pathogens` $= 3$ and `N` $\geq 3$.
Finally, `updatePop()` performs a single simulation time step, so we expect only one additional row to be populated.
Instead of testing by their numerical values, we verify logical statements of the results within our macro understanding of the model system (\@ref(testcomplex)).
\newline
```{r, test_complex, ref = "testcomplex", codecap = "Test more complex cases using your understanding of the system.", results = 'hide'}
pop1 <- initialisePop(n_steps = 20, N = 12, pathogens = 3)
# expect all (except the first row) are NAs
expect_true(all(is.na(pop1[-1, ])))
# the unique values of pop1[1, ] should be a, b, c
# and nothing else.
expect_true(setequal(c("a", "b", "c"), pop1[1, ]))
pop2 <- updatePop(pop1, t = 2, N = 12)
# after update, expect 1st & 2nd row not to have NAs
expect_true(all(!is.na(pop2[1:2, ])))
# and also expect that rows other than 1st & 2nd are NAs.
expect_true(all(is.na(pop2[-c(1:2), ])))
```
## Combine simple functions and test them at a higher level{-#combine}
In the end an entire model only runs when its functions work together seamlessly.
So we next check their connections; achieved through nesting functions together, or defining them at a higher level and checking the macro aspects of the model.
This could be considered integration testing rather than unit testing.
We define a function `fullSim()` that runs both `initialisePop()` and `updatePop()` to yield one complete simulation.
We would expect there to be no `NA`s in the output from `fullSim()` and every element to be either $a$, $b$ or $c$.
\newline
```{r, combine_simple_func, ref = "combinesimplefunc", codecap = "Combine simple functions through nesting to check high-level functionality.", result = 'hide'}
fullSim <- function(n_steps, N, pathogens){
pop <- initialisePop(n_steps, N, pathogens)
for(t in seq(2, n_steps)){
pop <- updatePop(pop, t, N)
}
return(pop)
}
pop <- fullSim(n_steps = 12, N = 20, pathogens = 3)
# expect no NAs
expect_true(!any(is.na(pop)))
# expect all elements to be one of a, b, or c
expect_true(all(pop %in% c("a", "b", "c")))
```
# Stochastic code
Stochastic simulations are a common feature in infectious disease models.
Stochastic events are difficult to test effectively because, by definition, we do not know beforehand what the result will be [@vsevvcikova2006automated; @guderlei2007statistical; @patrick2017toolkit].
We can check very broad-scale properties, like \@ref(combinesimplefunc), where we check the range of pathogen values.
However, code could still pass and be wrong (for example the base example (\@ref(firstcode)) would still pass that test).
There are however a number of approaches that can help.
## Split stochastic and deterministic parts {-#splitstochastic}
Isolate the stochastic parts of your code.
For example, `updatePop()` performs stochastic and deterministic operations in one line (\@ref(compactfunctions)).
First `updatePop()` stochastically samples who gets infected by whom at iteration `t`.
Then it takes those infection events and assigns the new infectious status for each individual.
We demonstrate in \@ref(splitdeterstoch) how this could be split.
We accept this is a fairly exaggerated example and that splitting a single line of code into two functions is rare.
The more common scenario is splitting a multi-line function into smaller functions which also brings benefits of code organisation so it does not feel like extra effort.
\newline
```{r, split_deter_stoch, ref = "splitdeterstoch", codecap = "Isolation of the deterministic and stochastic elements."}
chooseInfector <- function(N){
sample(x = N, size = N, replace = TRUE)
}
updateInfectionStatus <- function(x, t, infector_pathogen){
x[t, ] <- x[t - 1, infector_pathogen]
return(x)
}
updatePop <- function(x, t, N){
infector_pathogen <- chooseInfector(N)
x <- updateInfectionStatus(x, t, infector_pathogen)
return(x)
}
```
Now, half of `updatePop()` is deterministic so can be checked as previously discussed.
We still have `chooseInfector()` that is irreducibly stochastic.
We now examine some techniques for directly testing these stochastic parts.
## Pick a smart parameter for a deterministic result{-#deterministicparams}
In the same way that we used simple parameters values in \@ref(testsimplefirst), we can often find simple cases for which stochastic functions become deterministic.
For example, $X\sim\text{Bernoulli}(p)$ will always generate zeroes for $p=0$ or ones for $p=1$.
In the case of a single pathogen (\@ref(teststochdetermin)), the model is no longer stochastic.
So initialisation with one pathogen means the second time step should equal the first.
\newline
```{r, test_stoch_determin, ref = "teststochdetermin", codecap = "A stochastic function can output deterministically if you can find the right parameter set.", results = 'hide'}
pop <- initialisePop(n_steps = 2, N = 3, pathogens = 1)
pop <- updatePop(pop, t = 2, N = 3)
expect_equal(pop[1, ], pop[2, ])
```
## Test all possible answers (if few) {-#allpossible}
Working again with a simple parameter set, there are some cases where the code is stochastic, but with a small, finite set of outputs.
So we can run the function exhaustively and check it returns all of the possible outputs.
For a population of two people, `chooseInfector()` returns a length-2 vector with the possible elements of 1 or 2.
There are four possibilities when drawing who is infected by whom.
Both individuals can be infected by individual 1, giving the vector {1, 1}.
Both individuals can be infected by individual 2, giving {2, 2}.
Both individuals can infect themselves, giving {1, 2}.
Or finally both individuals can infect each other, giving {2, 1}.
In (\@ref(teststochfewvalues)), `chooseInfector(N = 2)` returns a length-2 vector with the indices of the infector for each infectee.
`paste0()` then turns this length-2 vector into a length-1 string with two characters; we expect this to be one of "11", "22", "12" or "21".
`replicate()` runs the expression 300 times.
\newline
```{r, test_stoch_fewvalues, ref = "teststochfewvalues", codecap = "Testing stochastic output when it only covers a few finite values.", results = 'hide'}
# Collapse each draw into a single string
# to make comparisons easier.
manyPops <-
replicate(300, paste0(chooseInfector(N = 2), collapse = ""))
# Check that all outputs are one of the four possible values
expect_true(all(manyPops %in% c("11", "22", "12", "21")))
```
## Use very large samples for the stochastic part {-#largesamples}
Testing can be made easier by using very large numbers.
This typically involves large sample sizes or numbers of stochastic runs.
For example, the clearest test to distinguish between our original, buggy code (\@ref(firstcode)) and our correct code (\@ref(correctcode)) is that in the correct code there is the possibility for an individual to infect more than one individual in a single time step.
In any given run this is never guaranteed, but the larger the population size the more likely it is to occur.
With a population of 1000, the probability that no individual infects two others is vanishingly rare (\@ref(teststochlargenum)).
As this test is now stochastic we should set the seed, using \texttt{set.seed()}, of the random number generator to make the test reproducible.
\newline
```{r, test_stoch_largenum, ref = "teststochlargenum", codecap = "Testing that the code does allow one individual to infect multiple individuals.", results='hide'}
set.seed(10261985)
n <- 1e3
infector_pathogen <- chooseInfector(n)
# Test if an individual infects more than one individual,
expect_true(any(duplicated(infector_pathogen)))
```
If we have an event that we know should never happen, we can use a large number of simulations to provide stronger evidence that it does not stochastically occur.
However, it can be difficult to determine how many times is reasonable to run a simulation, especially if time is short.
This strategy works best when we have a specific bug that occurs relatively frequently (perhaps once every ten simulations or so).
If the bug occurs every ten simulations and we have not fixed it we would be confident that it will occur at least once if we run the simulation 500 or 1000 times.
Conversely, if the bug does not occur even once in 500 or 1000 simulations we can be fairly sure we have fixed it.
\newline
\newline
Similarly, a bug might cause an event that should be rare to instead occur very regularly or even every time the code is run.
In our original buggy code (\@ref(firstcode)) we found that the proportions remained identical for entire simulations.
We would expect this to happen only very rarely.
We can run a large number of short simulations to check that this specific bug is not still occurring by confirming that the proportion of each pathogen is not always the same between the first and last time point.
As long as we find at least one simulation where the proportions of each pathogen are different between the first and last iteration, we can be more confident that the bug has been fixed.
\newline
```{r, returningpathogen,ref = "returningpathogen", codecap = "Assessing if a bug fix was a likely success with large code runs, when the bug was appearing relatively frequently."}
set.seed(11121955)
manySims <- replicate(500,
fullSim(n_steps = 20, N = 40,
pathogens = 3),
simplify = FALSE)
# Define a function that returns TRUE if the
# pathogen proportions are the same at the
# first and last time point and FALSE otherwise.
diffProportions <- function(x){
!identical(table(x[1, ]), table(x[20, ]))
}
# Check that at least one simulation had non-identical
# proportions. sapply runs the function diffProportions
# on each list element of manySims i.e. each simulation.
expect_true(any(vapply(manySims, diffProportions, TRUE)))
```
This example can be thought of more generally as having an expected distribution of an output and using a statistical test to compare the simulation results with the expectation.
Here, we had a binomial event (was the pathogen proportions different between the first and last time step) and an expected frequency of this event (greater than zero).
This approach is to testing stochastic code is detailed in @vsevvcikova2006automated, @guderlei2007statistical and @patrick2017toolkit.
If we know a property of the expected distribution (the mean for example) we can run the simulation repeatedly and use a statistical test to compare the distribution of simulated outputs to the expected distribution.
# Further testing
## Test incorrect inputs {-#testincorrect}
As well as testing that functions work when given the correct inputs, we should also test that they behave sensibly when given wrong ones.
This typically involves the user inputting argument values that do not make sense.
This may be, for example, because the inputted argument values are the wrong class, in the wrong numeric range or have missing data values --- therefore it is useful to test that functions fail gracefully.
This is especially true for external, exported functions, available to a user on a package's front-end.
However, it is not always obvious what constitutes an 'incorrect value' even to the person who wrote the code.
In some cases, inputting incorrect argument values may cause the function to fail quickly.
In other cases code may run silently giving false results or take a long time to throw an error.
Both of these cases can be serious or annoying and difficult to debug afterwards.
In this section we identify frailties in the code that are conceptually different to a bug; the model as specified is already implemented correctly.
Instead we are making the code more user-friendly and user-safe to protect against mistakes during future code re-use.
\newline
\newline
Often for these cases, the expected behaviour of the function should be to give an error.
There is no correct output for an epidemiological model with -1 pathogens.
Instead the function should give an informative error message.
Often the simplest solution is to use defensive programming and include argument checks at the beginning of functions.
We then have to write slightly unintuitive tests for an expression where the expected behaviour is an error.
If the expression does not throw an error, the test should throw an error (as this is not the expected behaviour).
Conversely, if the expression does throw an error, the test should pass and not error.
We can use the `expect_error()` function for this task.
This function takes an expression as its first argument and reports an error if the given expression does not throw an expected error.
\newline
\newline
We can first check that the code sensibly handles the user inputting a string instead of an integer for the number of pathogens.
Because this expression throws an error, `expect_error()` does not error and the test passes.
\newline
```{r, wrong1, ref = "wrong1", codecap = "Testing incorrect pathogen inputs.", warning = FALSE}
expect_error(
initialisePop(n_steps = 10, N = 4, pathogens = "three"))
```
Now we contrast what happens if the user inputs a vector of pathogens to the `initialisePop()` function.
Here we imagine the user intended to run a simulation with three pathogens: 1, 2 and 3.
\newline
```{r, wrong1b, ref = "wrong1b", codecap = "A failing test for incorrect pathogen inputs.", eval = FALSE}
> expect_error(initialisePop(n_steps = 5, N = 4, pathogens = 1:3))
Error: `initialisePop(n_steps = 5, N = 4, pathogens = 1:3)`
did not throw an error.
```
This test fails because the function does not throw an error.
Instead the code takes the first element of `pathogens` and ignores the rest.
Therefore, a population is created with one pathogen, not three, which is almost certainly not what the user wanted.
Here, the safest fix is to add an explicit argument check at the top of the function (\@ref(wrong1c)).
The same test now passes because `initialisePop()` throws an error when a vector is supplied to the `pathogens` argument.
\newline
```{r, wrong1c, ref = "wrong1c", codecap = "New definition, using defensive programming, of the initialisePop() function and a passing test for incorrect pathogen inputs."}
initialisePop <- function(n_steps, N, pathogens){
# Add a defensive argument check
if(length(pathogens) > 1) stop("pathogens must have length 1")
I <- matrix(data = NA, nrow = n_steps, ncol = N)
I[1, ] <- rep(x = letters[1:pathogens], length.out = N)
return(I)
}
expect_error(initialisePop(n_steps = 5, N = 4, pathogens = 1:3))
```
We can similarly check how the code handles a user inputting a vector of numbers to the `n_steps` argument (perhaps thinking it needed a vector of all time points to run).
In \@ref(wrong1c), `initialisePop()` does not throw an error if a vector is supplied to `n_steps`.
However, `fullSim()` does throw an error if a vector is supplied to `n_steps`.
While it is a good thing that `fullSim()` throws an error, the error message is not very informative.
If the code that runs before the error is thrown (in this case the `initialisePop()` function) takes a long time, it can also be time consuming to work out what threw the error.
It is also a signature of fragile code that the error is coincidental; a small change in the code might stop the error from occurring.
These considerations all point towards defensive programming as a good solution.
We can add an additional argument check to `initialisePop()`.
Importantly, we then want to check that `fullSim()` errors in the correct place (i.e. in `initialisePop()` rather than afterwards).
We can achieve this using the `regexp` argument of `expect_error()` that compares the actual error message to the expected error messages.
The test will only pass if the error message contains the string provided.
\newline
```{r, wrong2, ref = "wrong2", codecap = "Another new definition of the initialisePop() function and a passing test for the fullSim() function."}
initialisePop <- function(n_steps, N, pathogens){
# Argument checks
if(length(pathogens) > 1) stop("pathogens must have length 1")
if(length(n_steps) > 1) stop("n_steps must have length 1")
I <- matrix(data = NA, nrow = n_steps, ncol = N)
I[1, ] <- rep(x = letters[1:pathogens], length.out = N)
return(I)
}
expect_error(fullSim(n_steps = 1:100, N = 4, pathogens = 3),
regexp = "n_steps must have")
```
## Test special cases {-#corners}
When writing tests it is easy to focus on standard behaviour.
However, bugs often occur at _special cases_---when the code behaves qualitatively differently at a certain value or certain combinations of parameter values.
For example, in _R_, selecting two or more columns from a matrix e.g. `my_matrix[, 2:3]` returns a matrix while selecting one column e.g. `my_matrix[, 2]` returns a vector.
Code that relies on the returned object being a matrix would fail in this special case.
These special cases can often be triggered with parameter sets that are at the edge of parameter space.
This is where understanding of the functional form of the model can help.
Consider a function `divide(x, y)` that divides `x` by `y`.
We could test this function by noting that `y * divide(x, y)` should return `x`.
If we write code that tests standard values of `x` and `y` such as `(2 * divide(3, 2)) == 3` we would believe the function works for nearly all values of division, unless we ever try `y = 0`.
\newline
\newline
We checked earlier if the `n_steps` argument of `initialisePop()` worked by verifying that the returned population had the correct dimensions (\@ref(testallargs)).
We can test the `fullSim()` function in the same way (\@ref(edge2)).
However, if we try to run the same test with `n_steps = 1` we get an error before we even get to the test.
\newline
```{r, edge2, ref = "edge2", codecap = "fullSim() does not give a population matrix with the correct number of rows if one iteration is requested.", eval = FALSE}
> popt1 <- fullSim(n_steps = 2, N = 5, pathogens = 3)
> expect_equal(dim(popt1), c(2, 5))
> popt2 <- fullSim(n_steps = 1, N = 5, pathogens = 3)
Error in `[<-`(`*tmp*`, t, , value = x[t - 1, infector_pathogen])
subscript out of bounds
> expect_equal(dim(popt2), c(1, 5))
```
In general, requesting a simulation with 1 time step is without epidemiological meaning; this is not an objectively wrong use of the function.
The code behaves qualitatively differently for `n_steps = 1` than for `n_steps = 2` because the loop in \@ref(combinesimplefunc) runs from 2 to `n_steps` setting `t` to each value in turn.
When `n_steps` is 2 or more, this follows the sequence $\{2, 3, ...\}$.
When `n_steps` is 1, this instead follows the sequence $\{2, 1\}$.
The population is initialised with 1 row but the code still tries to store the results in the second row of the population matrix.
For special cases like this it may be rather subjective what the correct behaviour should be.
It might be appropriate to simply declare that this parameter value is not allowed.
This should be stated in the documentation and the function should throw an error.
Here however, we will decide that this is a valid parameter value.
We should change the code to handle this special case, and use the new test to check that our new code works.
In \@ref(edge3) we use an \texttt{if} statement to explicitly handle the special case of `n_steps = 1` and our test now passes.
\newline
```{r, edge3, ref = "edge3", codecap = "Handle the special case of $t = 1$ and test the new code.", eval = TRUE}
fullSim <- function(n_steps, N, pathogens){
pop <- initialisePop(n_steps, N, pathogens)
if(n_steps >= 2){
for(t in seq(2, n_steps)){
pop <- updatePop(pop, t, N)
}
}
return(pop)
}
popt2 <- fullSim(n_steps = 1, N = 5, pathogens = 3)
expect_equal(dim(popt2), c(1, 5))
```
# Unit testing frameworks {#frameworks}
Most programming languages have established testing packages.
For _R_, there is the $\texttt{testthat}$ package as already discussed as well as other packages such as $\texttt{tinytest}$ which has the benefit of having no dependencies.
When structuring _R_ code as standalone scripts, the tests should be saved in one or more scripts either in the same directory as the scripts in which the functions are defined, or within a subdirectory of the same project directory.
The testing script needs to load all the functions it is going to test (with `source()` for example) and then run the tests.
When structuring _R_ code as a package, tests should be kept in the directory `tests/testthat`; further requirements to the structure can be found in Chapter 7 of @wickham2015r.
All the tests in a package can then be run with `test()` from the $\texttt{devtools}$ package [@devtools] or `check()` for additional checks relevant to the package build.
If the code is to be part of a package then these tools are essential to run the code within the context of a build environment.
These tools also provide a clean environment to highlight if a test was previously relying on objects defined outside of the test script.
Furthermore, organising code in a package provides further benefits such as tools to aid the writing of documentation, systematic handling of dependencies and tools to track whether every line of code in the package is tested such as the \texttt{covr} package [@covr].
The practice of organising code as a package, even if there is no expectation that others will use the code, is under appreciated and underused in epidemiology.
\newline
\newline
The testing frameworks of other programming languages differ but the main concept of comparing a function evaluation to the expected output remains the same.
In _Julia_ there is the $\texttt{Test}$ package [@juliatest].
The basic structure for tests with this package is shown in \@ref(juliatest).
We name the test and write a single expression that evaluates to `TRUE` or `FALSE`.
For a _Julia_ package, unit tests reside in `test/runtests.jl` and tests are run with `Pkg.test()`.
\newline
```{r, juliatest, ref = "juliatest", codecap = "Julia test example.", eval = FALSE}
@testset "my_test_name" begin
@test sqrt(4) == 2
end
```
Finally, in _python_ tests can be performed using the $\texttt{unittest}$ framework [@pythonunittest]; tests must be written into a class that inherits from the `TestCase` class.
The tests must be written as methods with `self` as the first argument.
An example test script is shown below.
Tests should be kept in a directory called `Lib/test`, and the filename of every file with tests should begin with "`test_`".
\newline
```{r, testpy, ref = "testpy", codecap = "Python test example.", eval = FALSE}
import unittest
import math
class TestMaths(unittest.TestCase):
def test_sqrt(self):
self.assertEqual(math.sqrt(4), 2)
unittest.main()
```
# Continuous integration {#ci}
If your code is under version control [@osborne2014ten; @wilson2014best] and hosted online (e.g. on GitHub, GitLab or BitBucket), you can automate the running of unit tests---also known as _continuous integration_.
In this setup, whenever you push code changes from your local computer to the online repository, any tests that you have defined get run automatically.
Furthermore these tests can be automated to run periodically so that bugs caused by changes in dependencies are flagged.
There are various continuous integration services such as travis-ci.org, GitHub actions and GitLab pipelines.
These services are often free on a limited basis, or free if your code is open source.
\newline
\newline
We briefly describe the setup of the simplest case using Travis CI.
Setting up continuous integration is very straightforward for _R_ code already organised into a package and hosted openly on GitHub.
Within your version-controlled folder that contains the _R_ code, you add a one-liner file named "`.travis.yml`" that contains a description of which language the code uses.
\newline
```{awk, travis, ref = "travis", codecap = "A basic travis yml file.", eval = FALSE}
language:r
```
This file can also be created with `use_travis()` from the $\texttt{usethis}$ package.
You then sign up to travis-ci.org and point it to the correct GitHub repository.
To authenticate and trigger the first automatic test you need to make a minor change to your code, commit and push to GitHub.
More details on setting up Travis, and how to continuously track test coverage using \texttt{covr} and \texttt{codecov}, can be found in Chapter 14.3 of @wickham2015r.
# Concluding remarks
It is vital that infectious disease models are coded to minimise bugs.
Unit testing is a well-defined, principled way to do this.
There are many frameworks that make aspects of unit testing automatic and more informative and these should be used where possible.
\newline
\newline
The basic principles of unit testing are simple but writing good tests is a skill that takes time, practice and thought.
However, ensuring your code is robust and well-tested saves time and effort in the long run and helps prevent spurious results.
Our aim in this paper was to demonstrate tailored results for infectious disease modelling.
There are number of standard programming approaches to unit testing which would be good further reading (@wickham2015r Chapter 7, @osborne2014ten, @wilson2014best).
As demonstrated here, it is initially time consuming to program in this way, but over time it becomes habitual and both you and the policy-makers who use your models will benefit from it.
# Code availability
Please see the fully-reproducible and version-controlled code at \url{www.github.com/timcdlucas/unit_test_for_infectious_disease}.
# Funding sources
TMP, TDH, TCDL and ELD gratefully acknowledge funding of the NTD Modelling Consortium by the Bill & Melinda Gates Foundation (BMGF) (grant number OPP1184344).
Views, opinions, assumptions or any other information set out in this article should not be
attributed to BMGF or any person connected with them.
TMP's PhD is supported by the Engineering & Physical Sciences Research Council, Medical
Research Council and University of Warwick (grant number EP/L015374/1).
TMP thanks the Big Data Institute for hosting him during this work.
All funders had no role in the study design, collection, analysis, interpretation of data,
writing of the report, or decision to submit the manuscript for publication.
# Contributions: CRediT statement
Conceptualization: TCDL TMP
Data curation: TCDL
Formal Analysis: TCDL
Funding acquisition: TDH
Investigation: TCDL
Methodology: TCDL
Project administration: TCDL
Software: TCDL TMP
Validation: TMP
Visualization: TCDL TMP
Writing – original draft: TCDL
Writing – review & editing: TCDL TMP ELD TDH
# References {#references .unnumbered}