-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
118 lines (85 loc) · 3.69 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# forensicdatatoolkit
<!-- badges: start -->
<!-- badges: end -->
The goal of forensicdatatoolkit is to combine several tools created for investigating
summary data and detect anomalies. This is not a tool for finding fraud. Fraud is a legal term.
These tools merely tell you if the underlying data is probable or not. It can never tell us
why the statistics are wrong. Someone could lie or someone could mistype a number. Only open data
can tell us which it is.
## Installation
You can install the released version of forensicdatatoolkit from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("forensicdatatoolkit")
```
## Loading package
```{r example}
library(forensicdatatoolkit)
```
## Techniques available
What is in the package.
- [x] Granularity-Related Inconsistency of Means (GRIM)
- [] Granularity-Related Inconsistency of Means Mapped to Error Repeats (GRIMMER)
- [x] Sample Parameter Reconstruction via Iterative Techniques (SPRITE)
- [x] (DEBIT): A simple consistency test for binary data
We can use these tools to investigate summary statistics.
For example if we collect statistics from a set of papers we can investigate them.
Let's imagin we have 2 published papers about food. The author let several people taste chocolate bars and judge those bars. We use:
* a 1-5 scale for food acceptance
* a 1-9 scale for how dark the chocolate was and,
* a 1-7 scale for if you like the chocolate
The study only reports the summary statistics (means and standard deviations) for every scale.
```{r}
example_data <- data.frame(
study = c(1, 1, 1, 2, 2, 2),
description = rep(c("food acceptance", "darkness", "likeability"), 2),
n = c(20, 19, 19, 40, 41, 40),
means = c(3.50, 5.60, 2.13, 2.54, 8.00, 3.05),
sds = c(1.34, 1.34, 2.56, 0.45, 2, 1),
scale_min = rep(c(1, 1, 1), 2),
scale_max = rep(c(5, 9, 7))
)
example_data
```
It is possible that the researcher dropped some chocolate on his work when the values were
being calculated. Fortunately we can do a granularity check on the means to see if those means are
possible given the scale used, and the sample size. This check is known as the
GRIM test (Granularity-Related Inconsistency of Means). [LINK TO PREPRINT HERE].
### GRIM (Granularity-Related Inconsistency of Means)
Let's use an example: 20 people each picked an integer between 1 and 5 (because you cannot choose someting else). What happens to the mean score if one person went from 1: disgusting to 2:awful?
the mean score would go up by 1/20th. The minimal stepsize is 1/20th or 0.05. That means that every mean must be divisable by that stepsize. 3.5 is therefore fine but 3.53 is not.
```{r base r style}
example_data$mean_possible <- check_mean(example_data$means, example_data$n)
example_data[, c("study", "description", "mean_possible")]
```
<details>
<summary> This also works with tidyverse verbs **click here to unhide code** </summary>
```{r tidyverse style, eval=FALSE}
library(dplyr)
example_data %>%
mutate(
mean_possible = check_mean(mean = means, n = n)
) %>%
select(study, description, mean_possible)
```
</details>
Please note that the 'forensicdatatoolkit' project is released with a
[Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
<details>
<summary> At the moment of creation (when I knitted this document ) this was the state of my machine: **click here to expand** </summary>
```{r}
sessioninfo::session_info()
```
</details>