forked from CerebralMastication/R-Cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
12_UsefulTricks.Rmd
1451 lines (1019 loc) · 45.1 KB
/
12_UsefulTricks.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r include=FALSE, cache=FALSE}
set.seed(42)
options(digits = 3)
library(tidyverse)
library(knitr)
knitr::opts_chunk$set(
comment = "#>",
messages = FALSE,
collapse = TRUE,
out.width = "85%",
fig.align = 'center',
fig.width = 6,
fig.asp = 0.618, # 1 / phi
fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
# hook_output <- knit_hooks$get("output")
# knit_hooks$set(output = function(x, options) {
# lines <- options$output.lines
# if (is.null(lines)) {
# return(hook_output(x, options)) # pass to default hook
# }
# x <- unlist(strsplit(x, "\n"))
# more <- "etc ..."
# if (length(lines)==1) { # first n lines
# if (length(x) > lines) { # truncate the output, but add ....
# x <- c(head(x, lines), more)
# }
# } else {
# x <- c(more, x[lines], more)
# }
# # paste these lines together
# x <- paste(c(x, ""), collapse = "\n")
# hook_output(x, options)
# })
```
# Useful Tricks {#UsefulTricks}
Introduction {-#intro-UsefulTricks}
------------
The recipes in this chapter are neither obscure numerical calculations
nor deep statistical techniques. Yet they are useful functions and
idioms that you will likely need at one time or another.
Peeking at Your Data {#recipe-id278}
--------------------
### Problem {-#problem-id278}
You have a lot of data—too much to display at once. Nonetheless, you
want to see some of the data.
### Solution {-#solution-id278}
Use `head` to view the first few data values or rows:
``` {r, eval=FALSE}
head(x)
```
Use `tail` to view the last few data values or rows:
``` {r, eval=FALSE}
tail(x)
```
Or you can view the whole thing in an interactive viewer in RStudio:
``` {r, eval=FALSE}
View(x)
```
### Discussion {-#discussion-id278}
Printing a large dataset is pointless because everything just rolls off
your screen. Use `head` to see a little bit of the data (six rows by default):
``` {r}
load(file = './data/lab_df.rdata')
head(lab_df)
```
Use `tail` to see the last few rows and the number of rows:
``` {r}
tail(lab_df)
```
Both `head` and `tail` allow you to pass a number to the function to set the number of rows returned:
``` {r}
tail(lab_df, 2)
```
RStudio comes with an interactive viewer built in. You can call the viewer from the console or a script:
``` {r, eval=FALSE}
View(lab_df)
```
Or you can pipe an object to the viewer:
``` {r, eval=FALSE}
lab_df %>%
View()
```
When piping to `View` you will notice that the viewer names the View tab simply `.` (just a dot). To get a more informative name, you can put a descriptive name in quotes:
``` {r, eval=FALSE}
lab_df %>%
View("lab_df test from pipe")
```
The resulting RStudio viewer is shown in Figure \@ref(fig:rstudioview).
```{r rstudioview, echo=FALSE, fig.cap='RStudio viewer'}
knitr::include_graphics("images_v2/View.png")
```
### See Also {-#see_also-id278}
See Recipe \@ref(recipe-id202), ["Revealing the Structure of an Object"](#recipe-id202) for seeing the
structure of your variable’s contents.
Printing the Result of an Assignment {#recipe-id271}
------------------------------------
### Problem {-#problem-id271}
You are assigning a value to a variable and you want to see its value.
### Solution {-#solution-id271}
Simply put parentheses around the assignment:
``` {r}
x <- 1/pi # Prints nothing
(x <- 1/pi) # Prints assigned value
```
### Discussion {-#discussion-id271}
Normally, R inhibits printing when it sees you enter a simple
assignment. When you surround the assignment with parentheses, however,
it is no longer a simple assignment and so R prints the value. This can be very handy for quick debugging in a script.
### See Also {-#see_also-id271}
See Recipe \@ref(recipe-id017), ["Printing Something to the Screen"](#recipe-id017), for more ways to print things.
Summing Rows and Columns {#recipe-id138}
------------------------
### Problem {-#problem-id138}
You want to sum the rows or columns of a matrix or data frame.
### Solution {-#solution-id138}
Use `rowSums` to sum the rows:
``` {r, eval=FALSE}
rowSums(m)
```
Use `colSums` to sum the columns:
``` {r, eval=FALSE}
colSums(m)
```
### Discussion {-#discussion-id138}
This is a mundane recipe, but it’s so common that it deserves
mentioning. We use this recipe, for example, when producing reports that
include column totals. In this example, `daily.prod` is a record of this
week’s factory production and we want totals by product and by day:
``` {r}
load(file = './data/daily.prod.rdata')
daily.prod
colSums(daily.prod)
rowSums(daily.prod)
```
These functions return a vector. In the case of column sums, we can
append the vector to the matrix and thereby neatly print the data and
totals together:
``` {r}
rbind(daily.prod, Totals=colSums(daily.prod))
```
Printing Data in Columns {#recipe-id112}
------------------------
### Problem {-#problem-id112}
You have several parallel data vectors, and you want to print them in
columns.
### Solution {-#solution-id112}
Use `cbind` to form the data into columns, then print the result.
### Discussion {-#discussion-id112}
When you have parallel vectors, it’s difficult to see their relationship
if you print them separately:
``` {r}
load(file = './data/xy.rdata')
print(x)
print(y)
```
Use the `cbind` function to form them into columns that, when printed,
show the data’s structure:
``` {r}
print(cbind(x,y))
```
You can include expressions in the output, too. Use a tag to give them a
column heading:
``` {r}
print(cbind(x, y, Total = x + y))
```
Binning Your Data {#recipe-id137}
-----------------
### Problem {-#problem-id137}
You have a vector, and you want to split the data into groups according
to intervals. Statisticians call this *binning* your data.
### Solution {-#solution-id137}
Use the `cut` function. You must define a vector, say `breaks`, that
gives the ranges of the intervals. The `cut` function will group your
data according to those intervals. It returns a factor whose levels
(elements) identify each datum’s group:
```{r, eval=FALSE}
f <- cut(x, breaks)
```
### Discussion {-#discussion-id137}
This example generates 1,000 random numbers that have a standard normal
distribution. It breaks them into six groups by defining intervals at
±1, ±2, and ±3 standard deviations:
```{r, echo=FALSE}
## for reproducability
set.seed(42)
```
```{r}
x <- rnorm(1000)
breaks <- c(-3, -2, -1, 0, 1, 2, 3)
f <- cut(x, breaks)
```
The result is a factor, `f`, that identifies the groups. The `summary`
function shows the number of elements by level. R creates names for each
level, using the mathematical notation for an interval:
```{r}
summary(f)
```
The results are bell-shaped, which is what we expect from the `rnorm` function. There are five `NA`
values, indicating that two values in `x` fell outside the defined
intervals.
We can use the `labels` parameter to give nice, predefined names to the
six groups instead of the funky synthesized names:
```{r}
f <- cut(x, breaks, labels = c("Bottom", "Low", "Neg", "Pos", "High", "Top"))
```
Now the `summary` function uses our names:
``` {r}
summary(f)
```
Binning is useful for summaries such as histograms. But it results in
information loss, which can be harmful in modeling. Consider the extreme
case of binning a continuous variable into two values, `high` and `low`.
The binned data has only two possible values, so you have replaced a
rich source of information with *one bit* of information. Where the
continuous variable might be a powerful predictor, the binned variable
can distinguish at most two states and so will likely have only a
fraction of the original power. Before you bin, we suggest exploring
other transformations that are less lossy.
Finding the Position of a Particular Value {#recipe-id116}
------------------------------------------
### Problem {-#problem-id116}
You have a vector. You know a particular value occurs in the contents,
and you want to know its position.
### Solution {-#solution-id116}
The `match` function will search a vector for a particular value and
return the position:
``` {r}
vec <- c(100, 90, 80, 70, 60, 50, 40, 30, 20, 10)
match(80, vec)
```
Here `match` returns `3`, which is the position of `80` within `vec`.
### Discussion {-#discussion-id116}
There are special functions for finding the location of the minimum and
maximum values—`which.min` and `which.max`, respectively:
``` {r}
vec <- c(100,90,80,70,60,50,40,30,20,10)
which.min(vec) # Position of smallest element
which.max(vec) # Position of largest element
```
### See Also {-#see_also-id116}
This technique is used in Recipe \@ref(recipe-id210), ["Finding the Best Power Transformation"](#recipe-id210).
Selecting Every nth Element of a Vector {#recipe-id103}
---------------------------------------
### Problem {-#problem-id103}
You want to select every *n*th element of a vector.
### Solution {-#solution-id103}
Create a logical indexing vector that is `TRUE` for every *n*th element.
One approach is to find all subscripts that equal zero when taken modulo
*n*:
``` {r, eval=FALSE}
v[seq_along(v) %% n == 0]
```
### Discussion {-#discussion-id103}
This problem arises in systematic sampling: we want to sample a dataset
by selecting every *n*th element. The `seq_along(v)` function generates
the sequence of integers that can index `v`; it is equivalent to
`1:length(v)`. We compute each index value modulo *n* by the expression:
``` {r}
v <- rnorm(10)
n <- 2
seq_along(v) %% n
```
Then we find those values that equal zero:
``` {r}
seq_along(v) %% n == 0
```
The result is a logical vector, the same length as `v` and with `TRUE`
at every *n*th element, that can index `v` to select the desired
elements:
``` {r}
v
v[ seq_along(v) %% n == 0 ]
```
If you just want something simple like every second element, you can use
the Recycling Rule in a clever way. Index `v` with a two-element logical
vector, like this:
``` {r}
v[c(FALSE, TRUE)]
```
If `v` has more than two elements, then the indexing vector is too short.
Hence, R will invoke the Recycling Rule and expand the index vector to
the length of `v`, recycling its contents. That gives an index vector
that is `FALSE`, `TRUE`, `FALSE`, `TRUE`, `FALSE`, `TRUE`, and so forth.
Voilà! The final result is every second element of `v`.
### See Also {-#see_also-id103}
See Recipe \@ref(recipe-id050), ["Understanding the Recycling Rule"](#recipe-id050), for more about the Recycling Rule.
Finding Minimums or Maximums {#recipe-id107}
-------------------------------------
### Problem {-#problem-id107}
You have two vectors, *v* and *w*, and you want to find the minimums or
the maximums of pairwise elements. That is, you want to calculate:
> min(*v*~1~, *w*~1~), min(*v*~2~, *w*~2~), min(*v*~3~, *w*~3~), ...
or:
> max(*v*~1~, *w*~1~), max(*v*~2~, *w*~2~), max(*v*~3~, *w*~3~), ...
### Solution {-#solution-id107}
R calls these the *parallel minimum* and the *parallel maximum*. The
calculation is performed by `pmin(v,w)` and `pmax(v,w)`, respectively:
``` {r}
pmin(1:5, 5:1) # Find the element-by-element minimum
pmax(1:5, 5:1) # Find the element-by-element maximum
```
### Discussion {-#discussion-id107}
When an R beginner wants pairwise minimums or maximums, a common mistake
is to write `min(v,w)` or `max(v,w)`. Those are not pairwise operations:
`min(v,w)` returns a single value, the minimum over all `v` and `w`.
Likewise, `max(v,w)` returns a single value from all of `v` and `w`.
The `pmin` and `pmax` values compare their arguments in parallel,
picking the minimum or maximum for each subscript. They return a vector
that matches the length of the inputs.
You can combine `pmin` and `pmax` with the Recycling Rule to perform
useful hacks. Suppose the vector `v` contains both positive and negative
values, and you want to reset the negative values to zero. This does the
trick:
``` {r}
v <- c(-3:3)
v
v <- pmax(v, 0)
v
```
By the Recycling Rule, R expands the zero-valued scalar into a vector of
zeros that is the same length as `v`. Then `pmax` does an
element-by-element comparison, taking the larger of zero and each
element of `v`.
Actually, `pmin` and `pmax` are more powerful than the Solution
indicates. They can take more than two vectors, comparing all vectors in
parallel.
It is not uncommon to use `pmin` or `pmax` to calculate a new variable in a data frame based on multiple fields. Let's look at a quick example:
```{r}
df <- data.frame(a = c(1,5,8),
b = c(2,3,7),
c = c(0,4,9))
df %>%
mutate(max_val = pmax(a,b,c))
```
We can see the new column, `max_val`, now contains the row-by-row max value from the three input columns.
### See Also {-#see_also-id107}
See Recipe \@ref(recipe-id050), ["Understanding the Recycling Rule"](#recipe-id050), for more about the Recycling Rule.
Generating All Combinations of Several Variables {#recipe-id110}
----------------------------------------------
### Problem {-#problem-id110}
You have two or more variables. You want to generate all combinations of
their levels, also known as their *Cartesian product*.
### Solution {-#solution-id110}
Use the `expand.grid` function. Here, `f` and `g` are vectors:
``` {r, eval=FALSE}
expand.grid(f, g)
```
### Discussion {-#discussion-id110}
This code snippet creates two vectors—`sides` represents the two sides
of a coin, and `faces` represents the six faces of a die (those little
spots on a die are called *pips*):
``` {r}
sides <- c("Heads", "Tails")
faces <- c("1 pip", paste(2:6, "pips"))
```
We can use `expand.grid` to find all combinations of one roll of the die
and one coin toss:
``` {r}
expand.grid(faces, sides)
```
Similarly, we could find all combinations of two dice. But we won't print the output here because it's 36 lines long:
``` {r, eval=FALSE}
expand.grid(faces, faces)
```
The result of `expand.grid` is a data frame. R automatically provides the row names and
column names.
The Solution and the example show the Cartesian product of two vectors,
but `expand.grid` can handle three or more factors, too.
### See Also {-#see_also-id110}
If you’re working with strings and want a bit more control over how you bring the combinations together, then you can also use Recipe \@ref(recipe-id109),
["Generating All Pairwise Combinations of Strings"](#recipe-id109), to generate combinations.
Flattening a Data Frame {#recipe-id153}
--------------------
### Problem {-#problem-id153}
You have a data frame of numeric values. You want to process all its
elements together, not as separate columns—for example, to find the mean
across all values.
### Solution {-#solution-id153}
Convert the data frame to a matrix and then process the matrix. This
example finds the mean of all elements in the data frame `dfrm`:
``` {r, eval=FALSE}
mean(as.matrix(dfrm))
```
It is sometimes necessary then to convert the matrix to a vector. In
that case, use `as.vector(as.matrix(dfrm))`.
### Discussion {-#discussion-id153}
Suppose we have a data frame, such as the factory production data from Recipe \@ref(recipe-id138), ["Summing Rows and Columns"](#recipe-id138):
``` {r}
load(file = './data/daily.prod.rdata')
daily.prod
```
Suppose also that we want the average daily production across all days
and products. This won’t work:
``` {r, error=TRUE}
mean(daily.prod)
```
The `mean` function doesn't really know what to do with a data frame, so it just throws an error. But when you want the average across all
values, first collapse the data frame down to a matrix:
``` {r}
mean(as.matrix(daily.prod))
```
This recipe works only on data frames with all-numeric data. Recall that
converting a data frame with mixed data (numeric columns mixed with
character columns or factors) into a matrix forces all columns to be
converted to characters.
### See Also {-#see_also-id153}
See Recipe \@ref(recipe-id074), ["Converting One Structured Data Type into Another"](#recipe-id074), for more about converting between data types.
Sorting a Data Frame {#recipe-id247}
--------------------
### Problem {-#problem-id247}
You have a data frame. You want to sort the contents, using one column
as the sort key.
### Solution {-#solution-id247}
Use the `arrange` function from the `dplyr` package:
``` {r, eval=FALSE}
df <- arrange(df, key)
```
Here `df` is a data frame and `key` is the sort-key column.
### Discussion {-#discussion-id247}
The `sort` function is great for vectors but is ineffective for data
frames. Suppose we have the following data frame and we want to sort by month:
``` {r}
load(file = './data/outcome.rdata')
print(df)
```
The `arrange` function rearranges the months into ascending
order and returns the entire data frame:
``` {r}
library(dplyr)
arrange(df, month)
```
After rearranging the data frame, the month column is in ascending
order—just as we wanted. If we want to sort the data in descending order, put a `-` in front of the column you want to sort by:
``` {r}
arrange(df,-month)
```
If you want to sort by multiple columns, you can add them to the `arrange` function. The following example sorts by month first, then by day:
``` {r}
arrange(df, month, day)
```
Within months 7 and 8, the days are now sorted into ascending order.
Stripping Attributes from a Variable {#recipe-id223}
------------------------------------
### Problem {-#problem-id223}
A variable is carrying around old attributes. You want to remove some or
all of them.
### Solution {-#solution-id223}
To remove all attributes, assign `NULL` to the variable’s `attributes`
property:
``` {r, eval=FALSE}
attributes(x) <- NULL
```
To remove a single attribute, select the attribute using the `attr`
function, and set it to `NULL`:
``` {r, eval=FALSE}
attr(x, "attributeName") <- NULL
```
### Discussion {-#discussion-id223}
Any variable in R can have attributes. An attribute is simply a
name/value pair, and the variable can have many of them. A common
example is the dimensions of a matrix variable, which are stored in an
attribute. The attribute name is `dim` and the attribute value is a
two-element vector giving the number of rows and columns.
You can view the attributes of `x` by printing `attributes(x)` or
`str(x)`.
Sometimes you want just a number and R insists on giving it attributes.
This can happen when you fit a simple linear model and extract the
slope, which is the second regression coefficient:
``` {r}
load(file = './data/conf.rdata')
m <- lm(y ~ x1)
slope <- coef(m)[2]
slope
```
When we print `slope`, R also prints `"x1"`. That is a name attribute
given by `lm` to the coefficient (because it’s the coefficient for the
`x1` variable). We can see that more clearly by printing the internals
of `slope`, which reveals a `"names"` attribute:
``` {r}
str(slope)
```
It's easy to strip out all the attributes, after which the slope value
becomes simply a number:
``` {r}
attributes(slope) <- NULL # Strip off all attributes
str(slope) # Now the "names" attribute is gone
slope # And the number prints cleanly without a label
```
Alternatively, we could have stripped out the single offending attribute
this way:
``` {r}
attr(slope, "names") <- NULL
```
> **Warning**
>
> Remember that a matrix is a vector (or list) with a `dim` attribute.
> If you strip out all the attributes from a matrix, that will strip away
> the dimensions and thereby turn it into a mere vector (or list).
> Furthermore, stripping the attributes from an object (specifically, an
> S3 object) can render it useless. So remove attributes with care.
### See Also {-#see_also-id223}
See Recipe \@ref(recipe-id202), ["Revealing the Structure of an Object"](#recipe-id202), for more about
seeing attributes.
Revealing the Structure of an Object {#recipe-id202}
------------------------------------
### Problem {-#problem-id202}
You called a function that returned something. Now you want to look
inside that something and learn more about it.
### Solution {-#solution-id202}
Use `class` to determine the thing’s object class:
``` {r, eval=FALSE}
class(x)
```
Use `mode` to strip away the object-oriented features and reveal the
underlying structure:
``` {r, eval=FALSE}
mode(x)
```
Use `str` to show the internal structure and contents:
``` {r, eval=FALSE}
str(x)
```
### Discussion {-#discussion-id202}
We are regularly amazed how often we call a function, get something back, and wonder:
“What the heck is this thing?” Theoretically, the function documentation
should explain the returned value, but somehow we feel better when we can
see its structure and contents ourselves. This is especially true for
objects with a nested structure: objects within objects.
Let’s dissect the value returned by `lm` (the linear modeling function)
in the simplest linear regression recipe, Recipe \@ref(recipe-id203), ["Performing Simple Linear Regression"](#recipe-id203):
``` {r}
load(file = './data/conf.rdata')
m <- lm(y ~ x1)
print(m)
```
Always start by checking the thing’s class. The class indicates if it’s
a vector, matrix, list, data frame, or object:
``` {r}
class(m)
```
Hmmm. It seems that `m` is an object of class `lm`. That may not mean
anything to you, however. But you know that all object classes are built upon
the native data structures (vector, matrix, list, or data frame), so we
use `mode` to strip away the object facade and reveal the underlying
structure:
``` {r}
mode(m)
```
Ah-ha! It seems that `m` is built on a list structure. Now we can use
list functions and operators to dig into its contents. First, we want to
know the names of its list elements:
``` {r}
names(m)
```
The first list element is called `*"coefficients"*`. We could guess those are
the regression coefficients. Let’s have a look:
``` {r}
m$coefficients
```
Yes, that’s what they are. We recognize those values.
We could continue digging into the list structure of `m`, but that would
get tedious. The `str` function does a good job of revealing the
internal structure of any variable:
``` {r}
str(m)
```
Notice that `str` shows all the elements of `m` and then recursively
dumps each element’s contents and attributes. Long vectors and lists are
truncated to keep the output manageable.
There is an art to exploring an R object. Use `class`, `mode`, and `str`
to dig through the layers. We have found that often `str` tells you everything you want to know...and sometimes a lot more.
Timing Your Code {#recipe-id224}
----------------
### Problem {-#problem-id224}
You want to know how much time is required to run your code. This is
useful, for example, when you are optimizing your code and need “before”
and “after” numbers to measure the improvement.
### Solution {-#solution-id224}
The `tictoc` package contains a very easy way to time and label chunks of code. The `tic` function starts a timer and the `toc` function stops the timer and reports the
execution time:
``` {r, eval=FALSE}
library(tictoc)
tic('Optional helpful name here')
aLongRunningExpression()
toc()
```
The output is the execution time in seconds.
### Discussion {-#discussion-id224}
Suppose we want to know the time required to generate 10,000,000 random
normal numbers and sum them together:
``` {r big_rnorm}
library(tictoc)
tic('making big numbers')
total_val <- sum(rnorm(1e7))
toc()
```
The `toc` function returns the message set in `tic` along with the runtime in seconds.
If you assign the result of `toc` to an object, you can have access to the underlying start time, finish time, and message:
```{r multi_rnorm}
tic('two sums')
sum(rnorm(10000000))
sum(rnorm(10000000))
toc_result <- toc()
print(toc_result)
```
If you want to report the results in minutes (or hours!), you can use the elements of the output to get at the underlying start and finish times:
```{r}
print(paste('the code ran in',
round((toc_result$toc - toc_result$tic) / 60, 4),
'minutes'))
```
You can accomplish the same thing using just `Sys.time` calls but without the convenience of labeling and clarity of syntax provided by `toctoc`:
```{r}
start <- Sys.time()
sum(rnorm(10000000))
sum(rnorm(10000000))
Sys.time() - start
```
Suppressing Warnings and Error Messages {#recipe-id113}
---------------------------------------
### Problem {-#problem-id113}
A function is producing annoying error messages or warning messages. You
don’t want to see them.
### Solution {-#solution-id113}
Surround the function call with `suppressMessage(`...`)` or
`suppressWarnings(`...`)`:
``` {r, eval=FALSE}
suppressMessage(annoyingFunction())
suppressuWarnings(annoyingFunction())
```
### Discussion {-#discussion-id113}
The Augmented Dickey–Fuller Test, `adf.test`, is a popular time series function. However, it produces an
annoying warning message, shown here at the bottom of the output, when
the *p*-value is below 0.01:
``` {r}
library(tseries)
load(file = './data/adf.rdata')
results <- adf.test(x)
```
Fortunately, we can muzzle the function by calling it inside
`suppressWarnings(`...`)`:
``` {r}
results <- suppressWarnings(adf.test(x))
```
Notice that the warning message disappeared. The message is not entirely
lost because R retains it internally. We can retrieve the message at our
leisure by using the `warnings` function:
``` {r}
warnings()
```
Some functions also produce “messages” (in R terminology), which are
even more benign than warnings. Typically, they are merely informative
and not signals of problems. If such a message is annoying you, call the
function inside `suppressMessages(...)`, and the message will
disappear.
### See Also {-#see_also-id113}
See the `options` function for other ways to control the reporting of
errors and warnings.
Taking Function Arguments from a List {#recipe-id118}
-------------------------------------
### Problem {-#problem-id118}
Your data is captured in a list structure. You want to pass the data to
a function, but the function does not accept a list.
### Solution {-#solution-id118}
In simple cases, convert the list to a vector. For more complex cases,
the `do.call` function can break the list into individual arguments and
call your function:
``` {r, eval=FALSE}
do.call(function, list)
```
### Discussion {-#discussion-id118}
If your data is in a vector, life is simple and most R
functions work as expected:
``` {r}
vec <- c(1, 3, 5, 7, 9)
mean(vec)
```
If your data is captured in a list, some functions complain and
return a useless result, like this:
``` {r, error=TRUE}
numbers <- list(1, 3, 5, 7, 9)
mean(numbers)
```
The `numbers` list is a simple, one-level list, so we can just convert
it to a vector and call the function:
``` {r}
mean(unlist(numbers))
```
The big headaches come when you have multilevel list structures: lists
within lists. These can occur within complex data structures. Here is a
list of lists in which each sublist is a column of data:
``` {r}
my_lists <-
list(col1 = list(7, 8),
col2 = list(70, 80),
col3 = list(700, 800))
my_lists
```
Suppose we want to form this data into a matrix. The `cbind` function is
supposed to create data columns, but it gets confused by the list
structure and returns something useless:
``` {r}
cbind(my_lists)
```
If we `unlist` the data then we just get one big, long column, which is not what we are after either:
``` {r}
cbind(unlist(my_lists))
```