-
Notifications
You must be signed in to change notification settings - Fork 14
/
tidy-eval-basics.Rmd
189 lines (130 loc) · 5.48 KB
/
tidy-eval-basics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Tidy evaluation basics
```{r setup, include=FALSE}
library(tidyverse)
```
Likely without realizing it, you've been using two types of variables in R. First, there are what we'll call _environment_ variables. When you create an environment variable, you bind a value to a name in the current environment. For example, the following code creates an environment variable named `y` that refers to `1`.
```{r}
y <- 1
```
Now, when we call `y`, we get whatever value `y` refers to.
```{r}
y
```
The second type of variable we'll call _data_ variables. Data variables refer to columns in data frames, and only make sense in the context of those data frames. For example, `manufacturer` is a data variable that exists in the context of `mpg`. dplyr, ggplot2, and other tidyverse packages understand this, which is why functions like `select()` behave as you want.
```{r}
mpg %>%
select(manufacturer)
```
Instead of figuring out what `manufacturer` refers to in the environment, `select()` looks for a column named "manufacturer" in `mpg`.
Let's take a look at our failed function from the intro again.
```{r, error=TRUE}
grouped_mean <- function(var_group, var_summary) {
mpg %>%
group_by(var_group) %>%
summarize(mean = mean(var_summary))
}
grouped_mean(var_group = manufacturer, var_summary = cty)
```
`group_by()` thinks that `var_group` is a data variable, and so it looks inside `mpg` for a column called "var_group", doesn't find it, and so throws an error.
We actually want `var_group` to behave as a hybrid between an environment variable and a data variable. Like an environment variable, we want it to refer to another value (`manufacturer`). Then, we want `group_by()` to treat that value (`manufacturer`) as a data variable and look inside `mpg` for the matching column.
## Embracing with `{{ }}`
To fix `grouped_mean()`, all we need to do is _embrace_ `var_group` and `var_summary` with `{{ }}` (pronounced "curly curly").
```{r}
grouped_mean <- function(var_group, var_summary) {
mpg %>%
group_by({{ var_group }}) %>%
summarize(mean = mean({{ var_summary }}))
}
grouped_mean(var_group = manufacturer, var_summary = cty)
```
`{{ }}` might remind you of the use of `{ }` in `glue()`, and they fill similar roles.
Let's look at another example. This doesn't work:
```{r, error=TRUE, fig.show='hide'}
plot_mpg <- function(var) {
mpg %>%
ggplot(aes(var)) +
geom_bar()
}
plot_mpg(drv)
```
We want `var` to behave as a hybrid environment-data variable, so we need `{{ }}`.
```{r}
plot_mpg <- function(var) {
mpg %>%
ggplot(aes({{ var }})) +
geom_bar()
}
plot_mpg(drv)
```
## Forwarding with `...`
You might have noticed that some functions, like the purrr functions, take `...` as a final argument, allowing you to specify any number of additional arguments. You can use `...` in your own functions. There are two common use-cases.
### Passing full expressions
Functions like `filter()` take expressions, like `year == 1999` or `manufacturer == "audi"`. If you want to build a function that takes full expressions, you can use `...`.
```{r}
mpg_filter <- function(...) {
mpg %>%
filter(...)
}
mpg_filter(manufacturer == "audi", year == 1999)
```
`...` can take any number of arguments, so we can filter by an unlimited number of conditions.
```{r}
mpg_filter(
manufacturer == "audi",
year == 1999,
drv == "f",
fl == "p"
)
```
`mpg_filter()` _forwards_ `...` to `filter()`, which allows `filter()` to act on the contents of `...` just as it would outside of the function.
Here's another example that uses `select()`.
```{r}
mpg_select <- function(...) {
mpg %>%
select(...)
}
mpg_select(car = model, drivetrain = drv)
```
### Additional arguments
Sometimes, you'll want your function to take named arguments, but you'll also want to allow for any number of additional arguments. You can use `{{ }}`, and `...`.
```{r}
grouped_mean_2 <- function(df, var_summary, ...) {
df %>%
group_by(...) %>%
summarize(mean = mean({{ var_summary }})) %>%
ungroup()
}
grouped_mean_2(df = mpg, var_summary = cty, year, drv)
```
With the `...`, we can pass any number of grouping variables into `group_by()`.
```{r}
grouped_mean_2(df = mpg, var_summary = cty, year, drv, class)
```
## Assigning names
When you want to pass the name of a column into your function, you need to:
* Embrace the name with `{{ }}`.
* Use `:=` instead of `=` to assign the name.
```{r}
summary_mean <- function(df, var_summary, name_summary){
df %>%
summarize({{ name_summary }} := mean({{ var_summary }}))
}
summary_mean(df = mpg, var_summary = cty, name_summary = cty_mean)
```
You have to use `:=` instead of just plain `=` because you can't use `{{ }}` on both sides of a `=`.
(`:=` is called the walrus operator because it looks like a sideways walrus.)
## Splicing with `!!!`
Say you want to recode a variable:
```{r}
mpg %>%
mutate(drv = recode(drv, "f" = "front", "r" = "rear", "4" = "four"))
```
It's often a good idea to store your recode mapping as a vector in your parameters section. To get this to work, you'll need another tidyeval operator, `!!!`.
```{r}
recode_drv <- c(f = "front", r = "rear", "4" = "four")
mpg %>%
mutate(drv = recode(drv, !!! recode_drv))
```
`!!!` has two tasks:
* Unpack `recode_drv` so that each element is passed as a separate argument to `recode()` (i.e., `"f" = "front", "r" = "rear", "4" = "four"` instead of c(`"f" = "front", "r" = "rear", "4" = "four")`).
* Make sure that `recode()` treats the individual elements as data variables.