-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-ggplot.Rmd
executable file
·176 lines (106 loc) · 7.07 KB
/
03-ggplot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# Creating Visualizations using ggplot
This tutorial will introduce you to using `ggplot2` in order to visualize your data. R has many options for creating graphs and figures but, `ggplot2` is versitile, firendly to learn, and quite elegant. Using `ggplot2` you will be able to quickly learn the basics of it's functionallity and apply those skills to more advanced figures as explained in [Chapter 5](#strat-diagrams).
For more information, see the [data visualization](http://r4ds.had.co.nz/data-visualisation.html) chapter in [R for Data Science](http://r4ds.had.co.nz/).
## Prerequisites
The prerequisite for this tutorial is the `tidyverse` package. If this package isn't installed, you'll have to install it using `install.packages()`.
```{r, eval = FALSE}
install.packages("tidyverse")
```
Load the packages when you're done! If there are errors, you may have not installed the above packages correctly!
```{r}
library(tidyverse)
```
Finally, you will need to load the example data. For now, copy and paste the following code to load the [Halifax geochemistry dataset](data/halifax_geochem.csv) (we will learn how to read various types of files into R in the [preparing and loading data](#prepare-load) tutorial).
```{r, include=FALSE}
# read local version to build
halifax_geochem <- read_csv(
"data/halifax_geochem.csv",
col_types = cols(.default = col_guess())
)
```
```{r, eval = FALSE}
halifax_geochem <- read_csv(
"http://paleolimbot.github.io/r4paleolim/data/halifax_geochem.csv",
col_types = cols(.default = col_guess())
)
```
It's worth mentioning a little bit about what this data frame contains, since we'll be working with it for the rest of this tutorial. The data contains several bulk geochemical parameters from a recent study of Halifax drinking water reservoirs [@dunnington18], including Pockwock Lake, Lake Major, Bennery Lake, Lake Fletcher, Lake Lemont, First Chain Lake, First Lake, and Second Lake. (Later, we will take a look at the [core locations](data/halifax_geochem_cores.xlsx) as well as the geochemical data).
## Using ggplot
The Grammar of Graphics (the "gg" in "ggplot") is a way of describing a graphic that is derived from data, which in R is done using the `ggplot()` function and its many friends. Unlike other plotting functions, `ggplot()` builds graphics from the data up (rather than starting with a template of a graphic and working backward). Before we can use `ggplot` functionality we need to use the skills learned in [Chapter 2](#work-with-tables) where we filtered our data. See if you can use `filter()` on the `halifax_geochem` data to create the `pockwock_data` and `pockwock_major_data` variable (HINT: check out the [Filtering Rows](#chap2_filter) secontion in [Chapter 2](#work-with tabl)).
```{r}
pockwock_data <- filter(halifax_geochem, core_id == "POC15-2")
pockwock_major_data <- filter(halifax_geochem, core_id %in% c("POC15-2", "MAJ15-1"))
```
Now we can start with the `ggplot` example using the `pockwock_major_data`:
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent)) +
geom_point()
```
- What the structure of the `ggplot()` call is
Steps for plotting:
- Envision how you want your plot to look (draw it on paper if you have to!)
- Setup the data (`select()`, `filter()`)
- Setup your mapping (`aes()`)
- Choose your geoms (`geom_*()`)
- Make it look pretty
## Aesthetics
Categorical/Grouping Variables get mapped to X, Y, Colour, Shape, Linetype. Continuous Variables get mapped to X, Y, Colour, Size. For example, we can choose to colour the previous figure in order to visually see the difference between core samples by simply adding a `colour = core_id` argument to the aesthetic:
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, colour = core_id)) +
geom_point()
```
Notice how there is a legend automatically generated for us? We will look into changeing the labelling of that later in this tutorial! We can also choose to categorize our data with shapes other than the points seen previously, since not all figures may be welcome if they have colour!
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, shape = core_id)) +
geom_point()
```
Now we can try to provide some information on depth by making each symbols size relative to its depth value. For this example I only want to use the `pockwock_data` we previously created in [Chapter 2](#work-with-tables):
```{r}
ggplot(data = pockwock_data, mapping = aes(x = K_percent, y = Ti_percent, size = depth_cm)) +
geom_point()
```
## Geometries
We can easily change the type of geometry being used in the `ggplot` we have been working on. Here is an example of the same figure as above only with `geom_line` instead of `geom_point`:
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, colour = core_id)) +
geom_line()
```
Or we could choose multiple geometries!
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, colour = core_id)) +
geom_line() +
geom_point()
```
## Facets
An alternative to altering aesthetics of a plot to provide the end-user with visual seperation is to split your plot into facets, subplots that each display one subset of the data. We can do this simply by using the `facet_wrap()` argument. For this example we can use the origional `halifax_geochem` table and create one facet for each core!
```{r}
ggplot(data = halifax_geochem, mapping = aes(x = K_percent, y = Ti_percent)) +
geom_line() +
geom_point() +
facet_wrap(~core_id)
```
This is great, however we may wnat to change the layout of these facet plots. We can do this easily by specifying the number of rows `nrow=` or the number of columns `ncol=` within `facet_wrap`.
```{r}
ggplot(data = halifax_geochem, mapping = aes(x = K_percent, y = Ti_percent)) +
geom_line() +
geom_point() +
facet_wrap(~core_id, ncol = 4)
```
## Make it look pretty
### Labels
Rather than using the column headings from your data table which often are (and should be) rittled with short form versions of what it represents as well as underscores for any division of words. The `labs()` function cab be used to give your figure a more desirable presentation to the end users. Here I have changed the x and y values from `K_percent` and `Ti_percent` to `K (%)` and `Ti (%)` respectively. While we're at it, lets change the legend title text just for fun!
```{r}
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, colour = core_id)) +
geom_line() +
geom_point()
ggplot(data = pockwock_major_data, mapping = aes(x = K_percent, y = Ti_percent, colour = core_id)) +
geom_line() +
geom_point() +
labs(x="K (%)",y="Ti (%)", colour = "Core ID")
```
### Themes
### Scales
We can also change the scales of our axis using `scale_*_discrete()` or `scale_*_continuous()`. Common discrete scale parameters: name, breaks, labels, na.value, limits and guide.
## Summary
Tutorial summary
For more information, see the [data visualization](http://r4ds.had.co.nz/data-visualisation.html) chapter in [R for Data Science](http://r4ds.had.co.nz/).