forked from katiejolly/met-council-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08-visualization2.Rmd
134 lines (102 loc) · 5.23 KB
/
08-visualization2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Visualization, revisited
Now that we have some wrangling experience, we can create some more complex visualizations. This section will be mostly practice questions, promoting active learning and help-finding.
## Facets
Facets are a way to lay out related plots in a grid. For example, maybe we want to look at hourly patterns by the day of the week.
```{r}
ggplot(nice_ride_2018, aes(x = start_hour)) +
geom_density(adjust = 1.5) + # the adjust argument just makes the plots a little smoother
facet_wrap(~start_day) + # lay out the plots by the weekday
theme_minimal()
```
## Multivariate visualization
For two quantitative variables we can use a scatter plot, in ggplot the geom is `geom_point()`. Another way to summarize these variables would be to use a regression, or `geom_smooth()`. Non-parameteric regression is that default, but you can also superimpose a linear model.
The ggplot structure is the same, we just now specify both the x and y.
```{r}
ggplot(nice_ride_2018, aes(x = start_hour, y = tripduration)) +
geom_smooth() +
theme_minimal()
```
Another multivariate example is one quantitative and one qualitative. A stacked bar chart could be a good way to show this. We will use user type and week day to explain how to make this chart. We will start where we left off in the first visualization section.
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_day)) +
geom_bar(fill = "cyan4") +
theme_minimal() +
labs(title = "What days did people use niceride in 2018? Weekends are a favorite.", x = "Day of the week", y = "Trips", caption = "Source: Niceride MN")
```
Now, we will create one that differentiates by user type.
First, think about what aesthetic we might be using here. It's not an axis, but it will map to some other way we can display data.
My hint to you is that you will need to remove `fill = ...` from the `geom_bar()` call and think about where else `fill` might go.
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_day, fill = usertype)) +
geom_bar() +
theme_minimal() +
labs(title = "What days did people use niceride in 2018? Weekends are a favorite.", x = "Day of the week", y = "Trips", caption = "Source: Niceride MN")
```
## Practice
* Create a plot of the ten most popular start stations for `Classic` bikes on Saturdays and Sundays.
Should look like:
```{r echo = FALSE}
nice_ride_2018 %>%
filter(start_day %in% c("Sat", "Sun") & bike_type == "Classic") %>%
group_by(start_station_name) %>%
count() %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(x = start_station_name, y = n)) +
geom_col() +
theme_minimal() +
theme(axis.text = element_text(angle = 60, hjust = 1))
```
* Create a plot of the ten most popular end stations for `Classic` bikes on the weekdays.
```{r echo = FALSE}
nice_ride_2018 %>%
filter(!start_day %in% c("Sat", "Sun") & bike_type == "Classic") %>%
group_by(end_station_name) %>%
count() %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(x = end_station_name, y = n)) +
geom_col() +
theme_minimal() +
theme(axis.text = element_text(angle = 60, hjust = 1))
```
* Create a plot of the 10 most common origin-destination pairs (filter out NULL stations first).
Hints: create a variable of the origin destination pair name with `paste(start_station_name, end_station_name, sep = " to ")` plus a wrangling verb. Look into `coord_flip()`.
```{r echo = FALSE}
nice_ride_2018 %>%
filter(start_station_name != "NULL") %>%
group_by(start_station_name, end_station_name) %>%
summarise(trips = n()) %>%
arrange(desc(trips)) %>%
head(10) %>%
mutate(or_dest = paste(start_station_name, end_station_name, sep = " to ")) %>%
ggplot(aes(x = or_dest, y = trips)) +
geom_col() +
coord_flip() + theme_minimal() +
labs(title = "Round trips are popular among \nNiceride users, especially \nin the parks", x = "Origin - Destination Pair", y = "Trips")
```
* Create an overlapping density plot of hourly use by user type facetted by weekday (similar to above).
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_hour, fill = usertype)) +
geom_density(adjust = 1.5, alpha = 0.5) + # the adjust argument just makes the plots a little smoother
facet_wrap(~start_day) + # lay out the plots by the weekday
theme_minimal()
```
* Then [change the colors](https://ggplot2.tidyverse.org/reference/scale_manual.html) on that plot
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_hour, fill = usertype)) +
geom_density(adjust = 1.5, alpha = 0.6) + # the adjust argument just makes the plots a little smoother
facet_wrap(~start_day) + # lay out the plots by the weekday
theme_minimal() +
scale_fill_manual(values = c("#C0E6DE", "#FED18C"))
```
* And add better labels, including a legend title (read the help doc from above).
```{r echo = FALSE}
ggplot(nice_ride_2018, aes(x = start_hour, fill = usertype)) +
geom_density(adjust = 1.5, alpha = 0.6) + # the adjust argument just makes the plots a little smoother
facet_wrap(~start_day) + # lay out the plots by the weekday
theme_minimal() +
scale_fill_manual(values = c("#C0E6DE", "#FED18C"), name = "User Type") +
labs(title = "Subscribers use Niceride in a predictable commuter pattern", y = "", x = "Start time")
```
* On a new plot, make a "map" using the longitude and latitude variables as a scatter plot