-
Notifications
You must be signed in to change notification settings - Fork 7
/
session1_notes.qmd
325 lines (206 loc) · 15.7 KB
/
session1_notes.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# Introduction to R and RStudio
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE)
```
## The RStudio interface
There are a number of software packages based on the R programming language aimed at making writing and running analyses easier for users. They all run R in the background but look different and contain different features.
**RStudio** has been chosen for this course as it allows users to create script files, allowing code to be re-run, edited, and shared easily. RStudio also provides tools to help easily identify errors in R code, integrates help documentation into the main console and uses colour-coding to help read code at a glance.
Before installing RStudio, we must ensure that R is downloaded onto the machine. R is available to download for free for Windows, Mac, or Linux via the [**CRAN**](https://cran.r-project.org/) website.
Rstudio is also free to download from the [**Posit**](https://posit.co/) website.
#### The RStudio console window
The screenshot below shows the RStudio interface which comprises of four windows:
![RStudio interface](images/rstudio_ide.png)
**Window A: R script files**
All analysis and actions in R are carried out using the R syntax language. R script files allow you to write and edit code before running it in the console window.
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Limit script files to 80 characters per line to ensure it is readable.
RStudio has an option to add a margin that makes this easier to adhere to. Under the *Tools* drop-down menu, select *Global options*. Select *Code* from the list on the right, then under the *Display* tab, tick the *Show margin* box.
::::
:::
If this window is not visible, create a new script file using *File -> New File -> R Script* from the drop-down menus or clicking the ![New file icon](images/new_file_shortcut.png) icon above the console and selecting *R Script*. This will open a new, blank script file. More than one script file can be open at the same time.
Code entered into the script file does not run automatically. To run commands from the script, highlight the code and click the ![Run icon](images/run_shortcut.png) icon above the top right corner of the script window (this can be carried out by pressing `Ctrl + Enter` in Windows or `Command + Enter` on a Mac computer). More than one command can be run at the same time by highlighting all of them.
The main advantage of using the script file rather than entering the code directly into the console is that it can be saved, edited and shared. To save a script file, use *File -> Save As…* from the drop down menu, or click the ![save icon](images/save_shortcut.png) icon at the top of the window. It is important to save the script files at regular intervals to avoid losing work. Once the script file has been saved, we can also use the keyboard shortcuts `Ctrl + s` on Windows and `Command + s` on Mac to save the latest script file.
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Script file names should be meaningful, lower case, and end in `.R`. Avoid using special characters in file names, including spaces. Use `_` instead of spaces.
Where files should be run in a specific order, prefix the file name with numbers.
::::
:::
Past script files can be opened using *File -> Open File…* from the drop-down menu or by clicking the ![open icon](images/open_shortcut.png) icon and selecting a `.R` file. The keyboard shortcut to open an existing script file is `Ctrl + o` on Windows, and `Command + o` on Macs.
**Window B: The R console**
The R console window is where all commands run from the script file, results (other than plots), and messages, such as errors, are displayed. Commands can be written directly into the R console after the `>` symbol and executed using `Enter` on the keyboard. It is not recommended to write code directly into the console as it is cannot be saved or replicated.
Every time a new R session is opened, details about version and citations of R will be given by default. To clear text from the console window, use the keyboard shortcut `control + l` (this is the same for both Windows and Mac users). Be aware that this clears all text from the console, including any results. Before running this command, check that any results can be replicated within the script file.
**Window C: Environment and history**
This window lists all data and objects currently loaded into R. More details on the types of objects and how to use the Environment window are given in later sections.
**Window D: Files, plots, packages and help**
This window has many potential uses: graphics are displayed and can be saved from here, and R help files will appear here. This window is only available in the RStudio interface and not in the basic R package.
### Interface appearance
To change the appearance of the interface window, including font size and background colour, select `Global options` under the `Tools` drop-down menu and select the Appearance tab.
![](images/appearance.png){fig-align="center" width=500 height=500}
The font size is 14 by default but can be changed using the associated drop-down menu.
There are also a list of editor themes that can be applied, including themes with dark backgrounds and light writing such as `cobalt` or `dracula`.
When you are happy with the preview on the right side of the window, select OK and RStudio will apply the changes.
### Exercise 1 {.unnumbered}
1. Open a new script file if you have not already done so.
2. Save this script file into an appropriate location.
## R syntax
All analyses within R are carried out using **syntax**, the R programming language. It is important to note that R is case-sensitive, so always ensure that you use the correct combination of upper and lower case letters when running functions or calling objects.
Any text written in the R console or script file can be treated the same as text from other documents or programmes: text can be highlighted, copied and pasted to make coding more efficient.
When creating script files, it is important to ensure they are clear and easy to read. Comments can be added to script files using the `#` symbol. R will ignore any text following the `#` on the same line.
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Combining `#` and `-` creates sections within a script file, making them easier to navigate and organise.
For example:
```{r section style}
# Load data ----------
# Tidy data ----------
```
::::
:::
:::{.callout-note}
### Helpful hint
To comment out chunks of code, highlight the rows and use the keyboard shortcut *ctrl + shift + c* on Windows, and *Command + shift + c* on Mac
:::
The choice of brackets in R coding is particularly important as they all have different functions:
- Round brackets `( )` are the most commonly used as they define arguments of functions. Any text followed by round brackets is assumed to be a function and R will attempt to run it. If the name of a function is not followed by round brackets, R will return the algorithm used to create the function within the console.
- Square brackets `[ ]` are used to set criteria or conditions within a function or object.
- Curly brackets `{ }` are used within loops, when creating a new function, and within `for` and `if` functions.
All standard notation for mathematical calculations (`+`, `-`, `*`, `/`, `^`, etc.) are compatible with R. At its simplest level, R is just a very powerful calculator!
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Although R will work whether a space is added before/after a mathematical operator, the [style guide](https://style.tidyverse.org/syntax.html) recommends to add them surrounding most mathematical operations (`+`, `-`, `*`, `/`), but not around `^`.
For example:
```{r space style, eval = FALSE}
# Stylish code
1959 - 683
(351 + 457)^2 - (213 + 169)^2
# Un-stylish code
1959-683
(351+457)^2 - (213 + 169) ^ 2
```
::::
:::
### Exercise 2 {.unnumbered}
1. Add your name and the date to the top of your script file (hint: comment this out so R does not try to run it)
2. Use R to calculate the following calculations. Add the result to the same line of the script file in a way that ensures there are no errors in the code.
a. $64^2$
b. $3432 \div 8$
c. $96 \times 72$
When you have finished this exercise, select the entire script file (using `ctrl + a` on windows or `Command + a` on Mac) and run it to ensure there are no errors in the code.
## R objects and functions
### Objects
One of the main advantages to using R over other software packages such as SPSS is that more than one dataset can be accessed at the same time. A collection of data stored in any format within the R session is known as an **object**. Objects can include single numbers, single variables, entire datasets, lists of datasets, or even tables and graphs.
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Object names should only contain lower case letters, numbers and `_` (instead of a space to separate words). The should be meaningful and concise.
::::
:::
Objects are defined in R using the `<-` or `=` symbols. For example,
```{r assign qone}
object_1 <- 81
```
Creates an object in the environment named `object_1`, which takes the value `81`. This will appear in the environment window of the console (window C from the interface shown earlier).
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Although both work, use `<-` for assignment, not `=`.
::::
:::
To retrieve an object, type its name into the script or console and run it. This object can then be included in functions or operations in place of the value assigned to it:
```{r qone, collapse = TRUE}
object_1
object_1 * 2
```
R has some mathematical objects stored by default such as pi that can be used in calculations.
```{r pi}
pi
```
The `[1]` that appears at the beginning of each output line indicates that this is the first element in the object. If there were two lines then the second line would start with the number of that element in square brackets.
For example, if we had an object with 6 elements and when called the first line contained the first 5 elements, each line would begin with `[1]` and `[6]` respectively.
### Functions
**Functions** are built-in commands that allow R users to run analyses. All functions require the definition of arguments within round brackets `()`. Each function requires different information and has different arguments that can be used to customise the analysis. A detailed list of these arguments and a description of the function can be found in the function's associated **help file**.
### Help files
Each function that exists within R has an associated help file. RStudio does not require an internet connection to access these help files if the function is available in the current session of R.
To retrieve help files, enter `?` followed by the function name into the console window, e.g `?mean`. The help file will appear in window D of the interface shown in the introduction.
Help files contain the following information:
- Description: what the function is used for
- Usage: how the function is used
- Arguments: required and optional arguments entered into round brackets necessary for the function to work
- Details: relevant details about the function in question
- References
- See also: links to other relevant functions
- Examples: example code with applications of the function
### Error and warning messages
Where a function or object has not been correctly specified, or their is some mistake in the syntax that has been sent to the console, R will return an error message. These messages are generally informative and include the location of the error.
The most common errors include misspelling functions or objects:
```{r spelling error, error = TRUE}
sqrt(ojbect_1)
Sqrt(object_1)
```
Or where an object has not yet been specified:
```{r x y error, error = TRUE}
plot(x, y)
```
When R returns an error message, this means that the operation has been completely halted. R may also return warning messages which look similar to errors but does not necessarily mean the operation has been stopped.
Warnings are included to indicate that R suspects something in the operation may be wrong and should be checked. There are occasions where warnings can be ignored but this is only after the operation has been checked.
When working within the R console, if an incomplete command is run, a `+` symbol will appear in the console, rather than the usual `>`. This indicates that R expects you to keep writing the previous code. To overcome this issue, either finish the command on the next line of the console, or press the `esc` button on your keyboard to start from scratch.
One of the benefits of using RStudio rather than the basic R package is that it will suggest object or function names after typing the first few letters. This avoids spelling mistakes and accidental errors when running code. To accept the suggestion, either click the correct choice with your mouse or use the `tab` button on your keyboard.
### Cleaning the environment
To remove objects from the RStudio environment, we can use the `rm` function. This can be combined with the `ls()` function, which lists all objects in the environment, to remove all objects currently loaded:
```{r remove everything, eval = FALSE}
rm(list = ls())
```
:::{.callout-warning}
There are no undo and redo buttons for R syntax. The `rm` function will permanently delete objects from the environment. The only way to reverse this is to re-run the code that created the objects originally from the script file.
:::
### R packages
R packages are a collection of functions and datasets developed by R users that expand existing R capabilities or add completely new ones. Packages allow users to apply the most up-to-date methods shortly after they are developed, unlike other statistical software packages that require an entirely new version.
#### Installing packages from CRAN
The quickest way to install a package in R is by using the `install.packages` function. This sends RStudio to the online repository of tested and verified R packages (known as [CRAN](https://cran.r-project.org/)) and downloads the package files onto the machine you are currently working from in temporary files. Ensure that the package you wish to install is spelled correctly and surrounded by `''`.
:::{.callout-warning}
The `install.packages` function requires an internet connection, and can take a long time if the package has a lot of dependent packages that also need downloading.
This process should only be carried out the first time a package is used on a machine, or when a substantial update has taken place, to download the latest version of the package.
:::
#### Loading packages to an R session
Every time a new session of RStudio is opened, packages must be reloaded. To load a package into R (and gain access to the associated functions and data), use the `library` function.
Loading a package does not require an internet connection, but will only work if the package has already been installed and saved onto the computer you are working from. If you are unsure, use the function `installed.packages` to return a list of all packages that are loaded onto the machine you are working from.
:::{.stylebox}
::::{.stylebox-header}
:::::{.stylebox-icon}
:::::
Style tip
::::
::::{.stylebox-body}
Add your `library` function at the beginning of your script file. This reminds you to re-load packages when opening a new R session, and reduces the chance of error messages from functions requiring these packages.
::::
:::