forked from katiejolly/met-council-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-visualization.Rmd
170 lines (107 loc) · 4.54 KB
/
06-visualization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# Visualization
We will start with univariate visualization to learn the syntax, and then later move into more complex multivariate visualizations.
R has some built-in plotting ability. It's good to recognize the syntax, but it's not super common to see "in-the-wild" these days.
## Base R plotting
R can make histograms:
```{r}
hist(nice_ride_2018$tripduration)
```
And bar plots:
```{r}
barplot(table(nice_ride_2018$usertype), main = "User Type")
barplot(table(nice_ride_2018$bike_type), main = "Bike Type")
```
But base R plots are hard to customize and an inconsistent syntax. There's general agreement that they're not the best way to plot data in R so we aren't going to talk much about it.
## ggplot2
```{r echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/ggplot2_exploratory.png")
```
`ggplot2` is without competiton in the graphics in R world. Not only is it the most popular plotting package, it's one of the most popular packages, period. It is included in the tidyverse, which we already loaded. Part of the strength of ggplot is its customizability. You can create beautiful plots all in R with it. For example, the map on the cover of this book was [made with ggplot](https://timogrossenbacher.ch/2016/12/beautiful-thematic-maps-with-ggplot2-only/).
```{r out.width="700px", echo=FALSE}
knitr::include_graphics("https://timogrossenbacher.ch/wp-content/uploads/2016/12/tm-final-map-1.png")
```
ggplot uses the "grammar of graphics" to layer information onto plots. Each plot has the same general structure which makes it easy once you learn the structure.
### Categorical data
For example, let's recreate the bar plot of user types from above.
We will layer on the information in stages.
**Stage 1**: The plotting canvas
```{r}
ggplot(nice_ride_2018) # set up our plotting area
```
**Stage 2**: Frame your data with axes
```{r}
ggplot(nice_ride_2018, aes(x = usertype)) # set up our axes
```
**Stage 3**: Add some shapes
```{r}
ggplot(nice_ride_2018, aes(x = usertype)) +
geom_bar() # add the geoms
```
**Stage 4**: Give your plot a title and axis labels
```{r}
ggplot(nice_ride_2018, aes(x = usertype)) +
geom_bar() +
labs(title = "User Types", x = "User Type", y = "Trips") # add a title
```
Now we have a plot! There are many ways to customize it, but for now this is a great start.
### Try it out
Let's create a plot of the bike types (`bike_type`).
**Stage 1**: The plotting canvas
```{r eval=FALSE}
___(nice_ride_2018) # set up our plotting area
```
**Stage 2**: Frame your data with axes
```{r eval = FALSE}
___(nice_ride_2018, ___(x = ___)) # set up our axes
```
**Stage 3**: Add some shapes
```{r eval = FALSE}
___(nice_ride_2018, ___(x = ___)) +
geom____() # add the geoms
```
**Stage 4**: Give your plot a title and axis labels
```{r eval = FALSE}
___(nice_ride_2018, ___(x = ___)) +
geom____() +
___(title = ___, x = ___, y = ___) # add a title
```
Does your plot look like this?
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = bike_type)) +
geom_bar() +
labs(title = "Bike Types", x = "Bike type", y = "Trips") # add a title
```
### Quantitative data
We will use histograms and density plots as the univariate quantitative plots (boxplots could also be an option).
Let's first recreate the `start_month` histogram from above.
**Stage 1**: The plotting canvas
```{r}
ggplot(nice_ride_2018) # set up our plotting area
```
**Stage 2**: Frame your data with axes
```{r}
ggplot(nice_ride_2018, aes(x = tripduration)) # set up our axes
```
**Stage 3**: Add some shapes
```{r}
ggplot(nice_ride_2018, aes(x = tripduration)) +
geom_histogram() # add the geoms and the total number of geoms that you want
```
**Stage 4**: Add a title and axis labels
```{r}
ggplot(nice_ride_2018, aes(x = tripduration)) +
geom_histogram() + # add the geoms
labs(title = "Durations of trips under 5 hours (in seconds)", x = "Duration (seconds)", y = "Trips")
```
## Practice
1. Make a plot of the `start_day` variable with a good title and axis labels.
2. Change the fill of the bars to `cyan4`
3. Add a [theme](https://ggplot2.tidyverse.org/reference/ggtheme.html) with `... + theme_...()`
4. Add a caption that says `Source: Niceride MN`
In the end, your plot should look something like this:
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_day)) +
geom_bar(fill = "cyan4") +
theme_minimal() +
labs(title = "What days did people use niceride in 2018? Weekends are a favorite.", x = "Day of the week", y = "Trips", caption = "Source: Niceride MN")
```