forked from emdelponte/epidemiology-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
data-accuracy.qmd
572 lines (440 loc) · 22.1 KB
/
data-accuracy.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
---
title: "Reliability and accuracy"
---
# Severity data
## Terminology
Disease severity, mainly when expressed in percent area diseased assessed visually, is acknowledged as a more difficult and less time- and cost-effective plant disease variable to obtain. However, errors may occur even when assessing a more objective measure such as incidence. This is the case when an incorrect assignment or confusion of symptoms occur. In either case, the quality of the assessment of any disease variable is very important and should be gauged in the studies. Several terms can be used when evaluating the quality of disease assessments, including reliability, precision, accuracy or agreement.
**Reliability**: The extent to which the same estimates or measurements of diseased specimens obtained under different conditions yield similar results. There are two types. The *inter-rater reliability* (or reproducibility) is a measure of consistency of disease assessment across the same specimens between raters or devices. The *intra-rater* reliability (or repeatability) measures consistency by the same rater or instrument on the same specimens (e.g. two assessments in time by the same rater).
![Two types of reliability of estimates or measures of plant disease intensity](imgs/reliability.png){#fig-reliability.png fig-align="center" width="505"}
**Precision:** A statistical term to express the measure of variability of the estimates or measurements of disease on the same specimens obtained by different raters (or instruments). However, reliable or precise estimates (or measurements) are not necessarily close to an actual value, but precision is a component of accuracy or agreement.
**Accuracy or agreement**: These two terms can be treated as synonymous in plant pathological research. They refer to the closeness (or concordance) of an estimate or measurement to the actual severity value for a specimen on the same scale. Actual values may be obtained using various methods, against which estimates or measurements using an experimental assessment method are compared.
An analogy commonly used to explain accuracy and precision is the archer shooting arrows at a target and trying to hit the bull's eye (center of the target) with each of five arrows. The figure below is used to demonstrate four situations from the combination of two levels (high and low) for precision and accuracy. The figure was produced using the `ggplot` function of *ggplot2* package.
```{r}
#| code-fold: true
#| warning: false
#| label: fig-target
#| fig-cap: "The accuracy and precision of the archer is determined by the location of the group of arrows"
library(ggplot2)
target <-
ggplot(data.frame(c(1:10),c(1:10)))+
geom_point(aes(x = 5, y = 5), size = 71.5, color = "black")+
geom_point(aes(x = 5, y = 5), size = 70, color = "#99cc66")+
geom_point(aes(x = 5, y = 5), size = 60, color = "white")+
geom_point(aes(x = 5, y = 5), size = 50, color = "#99cc66")+
geom_point(aes(x = 5, y = 5), size = 40, color = "white")+
geom_point(aes(x = 5, y = 5), size = 30, color = "#99cc66")+
geom_point(aes(x = 5, y = 5), size = 20, color = "white")+
geom_point(aes(x = 5, y = 5), size = 10, color = "#99cc66")+
geom_point(aes(x = 5, y = 5), size = 4, color = "white")+
ylim(0,10)+
xlim(0,10)+
theme_void()
hahp <- target +
labs(subtitle = "High Accuracy High Precision")+
theme(plot.subtitle = element_text(hjust = 0.5))+
geom_point(aes(x = 5, y = 5), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5, y = 5.2), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5, y = 4.8), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 4.8, y = 5), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.2, y = 5), shape = 4, size =2, color = "blue")
lahp <- target +
labs(subtitle = "Low Accuracy High Precision")+
theme(plot.subtitle = element_text(hjust = 0.5))+
geom_point(aes(x = 6, y = 6), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 6, y = 6.2), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 6, y = 5.8), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.8, y = 6), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 6.2, y = 6), shape = 4, size =2, color = "blue")
halp <- target +
labs(subtitle = "High Accuracy Low Precision")+
theme(plot.subtitle = element_text(hjust = 0.5))+
geom_point(aes(x = 5, y = 5), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5, y = 5.8), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.8, y = 4.4), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 4.4, y = 5), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.6, y = 5.6), shape = 4, size =2, color = "blue")
lalp <- target +
labs(subtitle = "Low Accuracy Low Precision")+
theme(plot.subtitle = element_text(hjust = 0.5))+
geom_point(aes(x = 5.5, y = 5.5), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 4.5, y = 5.4), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.2, y = 6.8), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 4.8, y = 3.8), shape = 4, size =2, color = "blue")+
geom_point(aes(x = 5.2, y = 3), shape = 4, size =2, color = "blue")
library(patchwork)
(hahp | lahp) /
(halp | lalp)
```
Another way to visualize accuracy and precision is via scatter plots for the relationship between the actual values and the estimates.
```{r}
#| warning: false
#| message: false
#| label: fig-accuracy
#| fig-cap: "Scatter plots for the relationship between actual and estimated values representing situations of low or high precision and accuracy. The dashed line indicates the perfect concordance and the solid blue line represents the fit of the linear regression model"
#| code-fold: true
library(tidyverse)
library(r4pde)
theme_set(theme_r4pde())
dat <-
tibble::tribble(
~actual, ~ap, ~ip, ~ai, ~ii,
0, 0, 10, 0, 25,
10, 10, 20, 5, 10,
20, 20, 30, 30, 10,
30, 30, 40, 30, 45,
40, 40, 50, 30, 35,
50, 50, 60, 60, 65,
60, 60, 70, 50, 30
)
ap <- dat |>
ggplot(aes(actual, ap))+
geom_abline(intercept = 0, slope = 1,
linetype = 2, size = 1)+
geom_smooth(method = "lm")+
geom_point(color = "#99cc66", size = 3)+
ylim(0,70)+
xlim(0,70)+
labs(x = "Actual", y = "Estimate",
subtitle = "High Acccuracy High Precision")
ip <- dat |>
ggplot(aes(actual, ip))+
geom_abline(intercept = 0, slope = 1,
linetype = 2, size = 1)+
geom_smooth(method = "lm", se = F)+
geom_point(color = "#99cc66", size = 3)+
ylim(0,70)+
xlim(0,70)+
labs(x = "Actual", y = "Estimate",
subtitle = "Low Acccuracy High Precision")
ai <- dat |>
ggplot(aes(actual, ai))+
geom_abline(intercept = 0, slope = 1,
linetype = 2, size = 1)+
geom_smooth(method = "lm", se = F)+
geom_point(color = "#99cc66", size = 3)+
ylim(0,70)+
xlim(0,70)+
labs(x = "Actual", y = "Estimate",
subtitle = "High Acccuracy Low precision")
ii <- dat |>
ggplot(aes(actual, ii))+
geom_abline(intercept = 0, slope = 1,
linetype = 2, size = 1)+
geom_smooth(method = "lm", se = F)+
geom_point(color = "#99cc66", size = 3)+
ylim(0,70)+
xlim(0,70)+
labs(x = "Actual", y = "Estimate",
subtitle = "Low Acccuracy Low Precision")
library(patchwork)
(ap | ip) / (ai | ii)
```
## Statistical summaries
A formal assessment of the quality of estimates or measures is made using statistical summaries of the data expressed as indices that represent reliability, precision and accuracy. These indices can further be used to test hypothesis such as if one or another method is superior than the other. The indices or the tests vary according to the nature of the variable, whether continuous, binary or categorical.
### Inter-rater reliability
To calculate measures of inter-rater reliability (or reproducibility) we will work with a fraction of a larger dataset used in a published [study](https://bsppjournals.onlinelibrary.wiley.com/doi/abs/10.1111/ppa.13148). There, the authors tested the effect of standard area diagrams (SADs) on the reliability and accuracy of visual estimates of severity of soybean rust.
The selected dataset consists of five columns with 20 rows. The first is the leaf number and the others correspond to assessments of percent soybean rust severity by four raters (R1 to R4). Each row correspond to one symptomatic leaf. Let's assign the tibble to a dataframe called `sbr` (an acronym for soybean rust). Note that the variable is continuous.
```{r}
#| warning: false
#| message: false
library(tidyverse)
sbr <- tribble(
~leaf, ~R1, ~R2, ~R3, ~R4,
1, 0.6, 0.6, 0.7, 0.6,
2, 2, 0.7, 5, 1,
3, 5, 5, 8, 5,
4, 2, 4, 6, 2,
5, 6, 14, 10, 7,
6, 5, 6, 10, 5,
7, 10, 18, 12.5, 12,
8, 15, 30, 22, 10,
9, 7, 2, 12, 8,
10, 6, 9, 11.5, 8,
11, 7, 7, 20, 9,
12, 6, 23, 22, 14,
13, 10, 35, 18.5, 20,
14, 19, 10, 9, 10,
15, 15, 20, 19, 20,
16, 17, 30, 18, 13,
17, 19, 53, 33, 38,
18, 17, 6.8, 15, 9,
19, 15, 20, 18, 16,
20, 18, 22, 24, 15
)
```
Let's explore the data using various approaches. First, we can visualize how the individual estimates by the raters differ for a same leaf.
```{r}
#| warning: false
#| message: false
#| label: fig-raters1
#| fig-cap: "Visual estimates of soybean rust severity for each leaf by each of four raters"
# transform from wide to long format
sbr2 <- sbr |>
pivot_longer(2:5, names_to = "rater",
values_to = "estimate")
# create the plot
sbr2 |>
ggplot(aes(leaf, estimate, color = rater,
group = leaf))+
geom_line(color = "black")+
geom_point(size = 2)+
labs(y = "Severity estimate (%)",
x = "Leaf number",
color = "Rater")
```
Another interesting visualization is the correlation matrix of the estimates between all possible pair of raters. The `ggpairs` function of the *GGally* package is handy for this task.
```{r}
#| warning: false
#| message: false
#| label: fig-correl
#| fig-cap: "Correlation plots relating severity estimates for all pairs of raters"
library(GGally)
# create a new dataframe with only raters
raters <- sbr |>
select(2:5)
ggpairs(raters)+
theme_r4pde()
```
#### Coefficient of determination
We noticed earlier that the correlation coefficients varied across all pairs of rater. Sometimes, the means of squared Pearson's R values (R<sup>2</sup>), or the coefficient of determination is used as a measure of inter-rater reliability. We can further examine the pair-wise correlations in more details using the `cor` function,
```{r}
#| warning: false
#| message: false
#| label: tbl-correl
#| tbl-cap: "Pearson correlation coefficients for all pairs of raters"
knitr::kable(cor(raters))
```
The means of coefficient of determination can be easily obtained as follows.
```{r}
# All pairwise R2
raters_cor <- reshape2::melt(cor(raters))
raters2 <- raters_cor |>
filter(value != 1)
# means of R2
raters2$value
round(mean(raters2$value^2), 3)
```
#### Intraclass Correlation Coefficient
A common statistic to report in reliability studies is the Intraclass Correlation Coefficient (ICC). There are several formulations for the ICC whose choice depend on the particular experimental design. Following the convention of the seminal work by @shrout1979, there are three main ICCs:
- One-way random effects model, ICC(1,1): in our context, each leaf is rated by different raters who are considered as sampled from a larger pool of raters (random effects)
- Two-way random effects model, ICC(2,1): both raters and leaves are viewed as random effects
- Two-way mixed model, ICC(3,1): raters are considered as fixed effects and leaves are considered as random.
Additionally, the ICC may depend on whether the ratings are an average or not of several ratings. When an average is considered, these are called ICC(1,k), ICC(2,k) and ICC(3,k).
The ICC can be computed using the `ICC()` or the `icc()` functions of the *psych* or *irr* packages, respectively. They both provide the coefficient, F value, and the upper and lower bounds of the 95% confidence interval.
```{r}
#| warning: false
#| message: false
library(psych)
ic <- ICC(raters)
knitr::kable(ic$results[1:2]) # only selected columns
# call ic list for full results
```
The output of interest is a dataframe with the results of all distinct ICCs. We note that the ICC1 and ICC2 gave very close results. Now, let's obtain the various ICCs using the *irr* package. Differently from the the `ICC()` function, this one requires further specification of the model to use.
```{r}
#| warning: false
#| message: false
library(irr)
icc(raters, "oneway")
# The one used in the SBR paper
icc(raters, "twoway")
```
#### Overall Concordance Correlation Coefficient
Another useful index is the Overall Concordance Correlation Coefficient (OCCC) for evaluating agreement among multiple observers. It was proposed by @barnhart2002 based on the original index proposed by @lin1989, earlier defined in the context of two fixed observers. In the paper, the authors introduced the OCCC in terms of the interobserver variability for assessing agreement among multiple fixed observers. As outcome, and similar to the original CCC, the approach addresses the precision and accuracy indices as components of the OCCC. The `epi.occc` function of the *epiR* packge does the job but it does compute a confidence interval.
```{r}
#| warning: false
#| message: false
library(epiR)
epi.occc(raters, na.rm = FALSE, pairs = TRUE)
```
### Intrarater reliability
As defined, the intrarater reliability is also known as repeatability, because it measures consistency by the same rater at repeated assessments (e.g. different times) on the same sample. In some studies, we may be interested in testing whether a new method increases repeatability of assessments by a single rater compared with another one. The same indices used for assessing reproducibility (interrater) can be used to assess repeatability, and these are reported at the rater level.
### Precision
When assessing precision, one measures the variability of the estimates (or measurements) of disease on the same sampling units obtained by different raters (or instruments). A very high precision does not mean that the estimates are closer to the actual value (which is given by measures of bias). However, precision is a component of overall accuracy, or agreement. It is given by the Pearson's correlation coefficient.
Different from reliability, that requires only the estimates or measures by the raters, now we need a reference (gold standard) value to compare the estimates to. These can be an accurate rater or measures by an instrument. Let's get back to the soybean rust severity estimation dataset and add a column for the (assumed) actual values of severity on each leaf. In that work, the actual severity values were obtained using image analysis.
```{r}
sbr <- tibble::tribble(
~leaf, ~actual, ~R1, ~R2, ~R3, ~R4,
1, 0.25, 0.6, 0.6, 0.7, 0.6,
2, 2.5, 2, 0.7, 5, 1,
3, 7.24, 5, 5, 8, 5,
4, 7.31, 2, 4, 6, 2,
5, 9.07, 6, 14, 10, 7,
6, 11.6, 5, 6, 10, 5,
7, 12.46, 10, 18, 12.5, 12,
8, 13.1, 15, 30, 22, 10,
9, 14.61, 7, 2, 12, 8,
10, 16.06, 6, 9, 11.5, 8,
11, 16.7, 7, 7, 20, 9,
12, 19.5, 6, 23, 22, 14,
13, 20.75, 10, 35, 18.5, 20,
14, 23.56, 19, 10, 9, 10,
15, 23.77, 15, 20, 19, 20,
16, 24.45, 17, 30, 18, 13,
17, 25.78, 19, 53, 33, 38,
18, 26.03, 17, 6.8, 15, 9,
19, 26.42, 15, 20, 18, 16,
20, 28.89, 18, 22, 24, 15
)
```
We can explore visually via scatter plots the relationships between the actual value and the estimates by each rater ([@fig-scater]). To facilitate, we need the data in the long format.
```{r}
#| warning: false
#| message: false
#| label: fig-scater
#| fig-cap: "Scatterplots for the relationship between estimated and actual severity for each rater"
sbr2 <- sbr |>
pivot_longer(3:6, names_to = "rater",
values_to = "estimate")
sbr2 |>
ggplot(aes(actual, estimate))+
geom_point(size = 2, alpha = 0.7)+
facet_wrap(~rater)+
ylim(0,45)+
xlim(0,45)+
geom_abline(intercept = 0, slope =1)+
theme_r4pde()+
labs(x = "Actual severity (%)",
y = "Estimate severity (%)")
```
The Pearson's r for the relationship, or the precision of the estimates by each rater, can be obtained using the `correlation` function of the *correlation* package.
```{r}
precision <- sbr2 |>
select(-leaf) |>
group_by(rater) |>
correlation::correlation()
knitr::kable(precision[1:4])
```
The mean precision can then be obtained.
```{r}
mean(precision$r)
```
```{r}
```
### Accuracy
#### Absolute errors
It is useful to visualize the errors of the estimates which are obtained by subtracting the estimates from the actual severity values. This plot allows to visualize patterns in over or underestimations across a range of actual severity values.
```{r}
#| label: fig-errors
#| fig-cap: "Error (estimated - actual) of visual severity estimates"
sbr2 |>
ggplot(aes(actual, estimate-actual))+
geom_point(size = 3, alpha = 0.7)+
facet_wrap(~rater)+
geom_hline(yintercept = 0)+
theme_r4pde()+
labs(x = "Actual severity (%)",
y = "Error (Estimate - Actual)")
```
#### Concordance correlation coefficient
Lin's (1989, 2000) proposed the concordance correlation coefficient (CCC) for agreement on a continuous measure obtained by two methods. The CCC combines measures of both precision and accuracy to determine how far the observed data deviate from the line of perfect concordance. Lin's CCC increases in value as a function of the nearness of the data reduced major axis to the line of perfect concordance (the accuracy of the data) and of the tightness of the data about its reduced major axis (the precision of the data).
The `epi.ccc` function of the *epiR* package allows to obtain the Lin's CCC statistics. Let's filter only rater 2 and calculate the CCC statistics for this rater.
```{r}
#| label: tbl-ccc
#| tbl-cap: "Statitics of the concordance correlation coefficient summarizing accuracy and precision of visual severity estimates of soybean rust for a single rater"
library(epiR)
# Only rater 2
sbr3 <- sbr2 |> filter(rater == "R2")
ccc <- epi.ccc(sbr3$actual, sbr3$estimate)
# Concordance coefficient
rho <- ccc$rho.c[,1]
# Bias coefficient
Cb <- ccc$C.b
# Precision
r <- ccc$C.b*ccc$rho.c[,1]
# Scale-shift
ss <- ccc$s.shift
# Location-shift
ls <- ccc$l.shift
Metrics <- c("Agreement", "Bias coefficient", "Precision", "scale-shift", "location-shift")
Value <- c(rho, Cb, r, ss, ls)
res <- data.frame(Metrics, Value)
knitr::kable(res)
```
Now let's create a function that will allow us to estimate the CCC for all raters in the data frame in the wide format. The function assumes that the first two columns are the actual and estimates and the rest of the columns are the raters, which is the case for our `sbr` dataframe . Let's name this function `ccc_byrater`.
```{r}
#| warning: false
#| message: false
ccc_byrater <- function(data) {
long_data <- pivot_longer(data, cols = -c(leaf, actual),
names_to = "rater", values_to = "measurement")
ccc_results <- long_data %>%
group_by(rater) %>%
summarise(Agreement = as.numeric(epi.ccc(measurement, actual)$rho.c[1]),
`Bias coefficient` = epi.ccc(measurement, actual)$C.b,
Precision = Agreement * `Bias coefficient`,
scale_shift = epi.ccc(measurement, actual)$s.shift,
location_shift = epi.ccc(measurement, actual)$l.shift)
return(ccc_results)
}
```
Then, we use the `ccc_byrater` function with the original `sbr` dataset - or any other dataset in the wide format of similar structure. The output is a dataframe with all CCC statistics.
```{r}
#| warning: false
#| message: false
results <- ccc_byrater(sbr)
knitr::kable(results)
```
# Incidence data
## Accuracy
Incidence data are binary at the individual level; an individual is diseased or not. Here, different from severity that is estimated, the specimen is classified. Let's create two series of binary data, each being a hypothetical scenario of assignment of 12 plant specimens into two classes: healthy (0) or diseased (1).
```{r}
#| warning: false
order <- c(1:12)
actual <- c(1,1,1,1,1,1,1,1,0,0,0,0)
class <- c(0,0,1,1,1,1,1,1,1,0,0,0)
dat_inc <- data.frame(order, actual, class)
dat_inc
```
In the example above, the rater makes 9 accurate classification and misses 3: 2 diseased plants classified as being disease-free (sample 1 and 2), and 1 healthy plant that is wrongly classified as diseased (sample 9).
Notice that there are four outcomes:
TP = true positive, a positive sample correctly classified\
TN = true negative, a negative sample correctly classified\
FP = false positive, a negative sample classified as positive\
FN = false negative, a positive sample classified as positive.
There are several metrics that can be calculated with the help of a confusion matrix, also known as error matrix. Considering the above outcomes, here is a how a confusion matrix looks like.
Suppose a 2x2 table with notation
| | Actual value | |
|---------------------:|:------------:|:-------:|
| Classification value | Diseased | Healthy |
| Diseased | TP | FP |
| Healthy | FN | TN |
Let's create this matrix using a function of the caret package.
```{r}
#| warning: false
#| message: false
library(caret)
attach(dat_inc)
cm <- confusionMatrix(factor(class), reference = factor(actual))
cm
```
The function returns the confusion matrix and several statistics such as accuracy = (TP + TN) / (TP + TN + FP + FN). Let's manually calculate the accuracy and compare the results:
```{r}
TP = 3
FP = 2
FN = 1
TN = 6
accuracy = (TP+TN)/(TP+TN+FP+FN)
accuracy
```
Two other important metrics are sensitivity and specificity.
```{r}
sensitivity = TP/(TP+FN)
sensitivity
specificity = TN/(FP+TN)
specificity
```
```{r}
```
We can calculate some metrics using the *MixtureMissing* package.
```{r}
#| warning: false
library(MixtureMissing)
evaluation_metrics(actual, class)
```
## Reliability
```{r}
#| warning: false
library(psych)
tab <- table(class, actual)
phi(tab)
```