-
Notifications
You must be signed in to change notification settings - Fork 2
/
notes.Rmd
431 lines (293 loc) · 7.76 KB
/
notes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
---
title: "Stat 33B - Lecture 9"
date: March 18, 2020
output: pdf_document
---
Scope & Dynamic Lookup
======================
Scope
-----
The "scope" of a variable is the section of a program where that
variable exists and can be accessed.
In R:
* Function definitions are the primary way to create a new scope.
* If-statements and loops do not affect scope at all.
You can test whether a variable is in scope with the `exists()`
function:
```{r}
exists("x")
x = 3
x
exists("x")
```
Variables defined inside of a function are "local" to that function.
Local variables are not visible outside the function:
```{r}
f = function() {
y = 44
exists("y")
}
g = function() {
y
}
f()
g()
y
```
Local variables reset each time the function is called:
```{r}
create_or_add = function() {
if (exists("counter")) {
counter = counter + 1
} else {
counter = 0
}
counter
}
create_or_add()
create_or_add()
```
A function can use variables defined outside (non-local), but only if
those variables are in scope where the function was **defined**.
This property is called "lexical scoping".
For example:
```{r}
getx = function() x
x
getx()
make_gety = function() {
y = 3
function() x + y
}
gety = make_gety()
gety()
test_gety = function() {
y = 15
gety()
}
test_gety()
```
Variables defined directly in the R console are "global" and
available to any function.
Local variables "mask" or hide non-local variables with the same name:
```{r}
getx_local = function() {
x = "Hello"
x
}
getx_local()
```
Locals get priority!
EXCEPTION: R ignores non-functions when looking up the name of a
called function.
For example:
```{r}
compute_mean = function() {
x = c(1, 2, 3)
mean = 0
# the local variable mean is ignored, because it's not a function
mean(x)
}
compute_mean()
```
Besides function definitions, the `local()` function also creates a
new scope:
```{r}
local({
z = 4
})
```
In summary:
* Function definitions (or `local()`) create a new scope.
* Local variables get reset each time a function is called.
* Where a function is **defined** determines which variables are in
scope.
* Local variables mask non-local variables.
* R ignores non-functions when looking up the name of a called
function.
For a function, **where** a variable will be looked for depends only
on where the function was defined (because of lexical scoping).
Dynamic Lookup, Part 1: Functions
---------------------------------
Variables are only looked up **when** they are actually used.
For a function, this means variables are only looked up **when** the
function is called.
This is called "dynamic lookup".
For example:
```{r}
get_cats = function() cats
cats = 3
get_cats()
cats = 21
get_cats()
```
Dynamic lookup can be counterintuitive.
Environments
============
The data structure that R uses to keep track of variables at run-time
is called an "environment".
Each environment has a "frame" that maps names to R objects (a hash
table).
Each environment also has a "parent environment" (with one exception
we'll see later).
Dynamic Lookup, Part 2: Environments
------------------------------------
At run-time, **each call** to a function creates a new environment:
* Its frame contains the function's local variables.
* Its parent environment is the environment where the function was
**defined**. This satisfies lexical scoping.
When R looks up a variable, it checks the current environment first.
If the variable isn't there, it checks the environment's parent, then
the environment's parent's parent, and so on.
The "global environment" corresponds to the R console.
Use `globalenv()` to get the global environment:
```{r}
globalenv()
```
Assignment in Environments
--------------------------
The `<-` and `=` assign a variable in the current environment.
Use `<<-` to assign a variable in the parent environment:
```{r}
assignx = function(newx) {
x <<- newx
}
x
assignx("Reassigned")
x
```
This means you can write a function that has a "side effect" on the
environment where it was defined.
In the R community, side effects are generally frowned upon, because
they make code harder to understand and predict.
Some side effects are useful:
* Making plots.
* Writing data to a file.
But side effects are not necessary for most functions, and you should
avoid them when possible.
Use `assign()` to assign a variable in a specific environment:
```{r}
assign("stat33", "is great", globalenv())
# stat33 = "is great"
stat33
assign("dogs", 3)
dogs
```
Most of R's functions for working with environments assume the
current environment if you don't specify an environment.
Unlike other R objects, environments **do not** follow the
copy-on-write rule.
They are reference objects:
```{r}
# Copy-on-write example:
x = c(4, 5, 6)
y = x
x[1] = 6
x
y
# Exception is environments:
genv = globalenv()
genv2 = genv
genv$x = "This is a test!"
genv$x
genv2$x
```
Inspecting Environments
-----------------------
Use `ls()` to list the names (variables) in an environment:
```{r}
ls()
```
By default, the `ls()` function ignores names that start with `.`
Use `all.names = TRUE` to make the function print these names:
```{r}
.x = 14
ls(all.names = TRUE)
ls(name = globalenv())
```
The `names()` function can also print out all names in an environment:
```{r}
names(globalenv())
```
Use `exists()` to check whether a variable is in an environment:
```{r}
exists("hi", globalenv())
```
Use `[[`, or `$`, `get()` to get a variable:
```{r}
genv = globalenv()
genv$cats
genv[["cats"]]
get("cats", globalenv())
```
Use `parent.env()` to get the parent environment:
```{r}
parent.env(globalenv())
```
Dynamic Lookup, Part 3: The Search Path
---------------------------------------
The global environment is not the top-level environment.
Besides your own code, R also uses environments to keep track of
packages.
When you load a package with `library()`, R creates a new environment:
* Its frame contains the package's local variables.
* Its parent environment is the envionment of the most recently
loaded package.
* The new environment becomes the parent of the global environment.
So R remembers the order in which packages are loaded.
This history of packages is called the "search path".
Use `search()` to see the names of environments in the search path:
```{r}
search()
```
The "base environment", or `package:base`, is a special environment
that R creates at startup.
The parent of the base environment is another special environment
called the "empty environment", which contains no variables and has
no parent.
```{r}
parent.env(baseenv())
parent.env(parent.env(globalenv())) # and so on..
```
Use `PACKAGE::NAME` to access a name in a specific package:
```{r}
dogs = readRDS("data/dogs/dogs_full.rds")
dplyr::filter(dogs, weight < 20)
# 1. Use to disambiguate which function you're calling.
# stats::filter()
# 2. To use just one function from a package.
dplyr::filter()
```
You do not have to load the package first to do this (but the package
does have to be installed).
Closures
--------
A "closure" is a function that keeps track of its environment.
The idea of a closure is used in many languages, not just R.
In R, every function is a closure.
Functions keep track of the environment where they were defined.
Use `environment()` to get the environment where a function was
defined:
```{r}
f = function() 42
environment(f)
```
You can use the closure property to create functions that remember
previous calls.
For example, suppose we want to make a counter function that keeps
track of how many times it's been called:
```{r}
make_counter = function() {
count = 0
function() {
count <<- count + 1
count
}
}
counter = make_counter()
environment(counter)$count
counter()
counter()
```
Use functions with "memory" sparingly. They make it harder for others
to understand your code.