-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
121 lines (83 loc) · 7.11 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# aoristic
The goal of aoristic is to make sense of temporally vague data. It can sometimes be difficult to ascertain when some events (such as property crime) occur because the victim is not present when the crime happens As a result, police databases often record a *start* (or *from*) date and time, and an *end* (or *to*) date and time. The *start* datetime usually references when the victim last saw their stolen property, and the *to* date-time records when they first discovered their property missing. The period between the *start* datetime and *end* date-time is referred to as the event’s *time span*.
The time span between these date-times can be minutes, hours, or sometimes days: hence the term 'Aoristic', a word meaning “denoting simple occurrence of an action without reference to its completeness, duration, or repetition”. It has its origins in the Greek word *aoristos* which means *undefined*. For events with a location describes with either a latitude/longitude or X,Y coordinate pair, and a start and end date-time, this package generates an aoristic data frame with aoristic weighted probability values for each hour of the week, for each row. Various descriptive and graphic outputs are available.
## What's new in Version 1.1.1?
* A new version of "R-devel" svn revision r82904 (2022-09-24 19:32:52), redefined how some aspects of POSIXt are calculated. This caused some errors with how time durations are calculated in aoristic. This necessitated this minor update.
## Previous versions
### Version 1.1.0
* This version removes a convoluted process of outputting a formatted table to a jpeg with a simpler mechanism. This avoids the user downloading a third-party software package. The change occurs in the 'aoristic.summary' function.
* Adds a simple plot output option with new function 'aoristic.plot'
### Version 1.0.0
Version 0.6 was originally released on CRAN in 2015 by Dr. George Kikuchi then of Fresno State University and now at the Philadelphia Police Department. Given his extensive responsibilities he has been unable to maintain and update the program since the initial release. With his permission, the package has been taken over in 2020 and updated by Dr. Jerry Ratcliffe of Temple University.
Much of the original functionality has been discontinued and replaced by this updated package. In particular, version 1.0.0 onwards dispenses with rounding to the nearest hour for time spans, and uses a minute-by-minute method. In earlier versions (as was common in aoristic approaches until recently) time spans were rounded to the hour. So an event that happened between 10.55am and 11.55am would have an aoristic weight of 0.5 assigned to each hour, 1000-1059 and 1100-1159. This is despite the majority of the event occurring in the 11am hour. That rounding is removed in version 1.0.0 and aoristic weightings are assigned by the minute.
The kml mapping function from v0.6 is replaced here with a simpler plot function that maps the individual points for an user-selected hour. See ?aoristic.map
There is a new graph function that plots the overall aoristic distribution for an entire week, as well as each individual day of the week. see ?aoristic.graph
There is some rudimentary data checking in aoristic; however, most users will find that their effort is getting the date-time variables into the correct format. See the formatting example below for guidance.
## Installation
You can install the released version of aoristic from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("aoristic")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("jerryratcliffe/aoristic")
```
## Data formatting example
The package has some limited error checking; however, the main challenge that users will face is getting the data into the correct datetime format. Most of the heavy lifting is done by the aoristic.df() function. The user passes the name of a data frame and four parameters representing columns that contain
- **Xcoord** a vector of the event X coordinate or latitude (passed through for user)
- **Ycoord** a vector of the event Y coordinate or longitude (passed through for user)
- **DateTimeFrom** a vector for the 'From' datetime (POSIXct date-time object)
- **DateTimeTo** a vector for the 'To' datetime (POSIXct date-time object)
The package 'lubridate' is recommended as a way to more easily get the date time data into the correct format. As a demonstration, consider one of the datasets available in the aoristic package.
```{r}
library(aoristic)
data(NYburg)
head(NYburg)
```
The data consist of the crime *from* date (CMPLNT_FR_DT) and time (CMPLNT_FR_TM), the crime *to* date and time (CMPLNT_TO_DT and CMPLNT_TO_TM), and X and Y coordinates of the crime event.
Data preparation in this case will involve three steps (for START and END date-times):
1. Convert the times from (Excel originated) fractions of the day
2. Combine the dates and times into a new variable
3. Convert the new variable into a date-time format
#### 1. Convert times
The two time variables are in fractions of the day. We can replace the existing variables by recasting them in a more readable format, and view the result.
```{r}
NYburg$CMPLNT_FR_TM <- format(as.POSIXct((NYburg$CMPLNT_FR_TM) * 86400, origin = "1970-01-01"), "%H:%M")
NYburg$CMPLNT_TO_TM <- format(as.POSIXct((NYburg$CMPLNT_TO_TM) * 86400, origin = "1970-01-01"), "%H:%M")
head(NYburg)
```
#### 2. Combine dates and times
The aoristic functions expect the date and time variables to be in a single column, with a space separating them. We can do that with this code, which creates two new variables:
```{r}
NYburg$STARTDateTime <- paste(NYburg$CMPLNT_FR_DT,NYburg$CMPLNT_FR_TM, sep=' ')
NYburg$ENDDateTime <- paste(NYburg$CMPLNT_TO_DT,NYburg$CMPLNT_TO_TM, sep=' ')
head(NYburg)
```
#### 3. Convert new variables into date-time objects
The past stage is to use the convenience of the lubridate package to convert the string of dates and times into a date-time object:
```{r}
library(lubridate)
NYburg$STARTDateTime <- ymd_hm(NYburg$STARTDateTime, tz = "")
NYburg$ENDDateTime <- ymd_hm(NYburg$ENDDateTime, tz = "")
```
You get a warning that 49 observations failed to parse, because they are missing data (= *NA*). This sometimes happens when the police know exactly when the crime took place, and they only record the start date-time. We can see the final result of all this formatting:
```{r}
head(NYburg)
```
With the data formatted properly, we can start to use the aoristic functions. For example, you should always check the data to familiarize yourself with any missing data, or to see if any observations have logical errors where the from date-time occurs before the to date-time. The aoristic.df function can handle this, but it is always good to know your data.
```{r}
aor.chk.df <- aoristic.datacheck(NYburg, 'X_COORD_CD', 'Y_COORD_CD', 'STARTDateTime', 'ENDDateTime')
```