-
Notifications
You must be signed in to change notification settings - Fork 1
/
exercise6_module2.Rmd
148 lines (106 loc) · 2.7 KB
/
exercise6_module2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
## Exercise 6: about Module 2...
Create the script "exercise6.R" and save it to the "Rcourse/Module2" directory: you will save all the commands of exercise 6 in that script.
<br>Remember you can comment the code using #.
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
getwd()
setwd("Rcourse/Module2")
setwd("~/Rcourse/Module2")
```
</details>
**1- Install and load package ggplot2**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
# Install
install.packages(pkgs="ggplot2")
# Load in the environment
library("ggplot2")
```
Check with sessionInfo() that the package is indeed loaded. Was version of ggplot2 did you get?
</details>
**2- ggplot2 loads automatically the diamonds dataset in the working environment: you can use it as an object after ggplot2 is loaded. You can see it in "Environment -> change "Global Environment" to "package:ggplot2".**
What are the dimensions of diamonds? What are the column names of diamond?
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
# Dimensions of diamonds
dim(diamonds)
# Column names of diamonds
colnames(diamonds)
```
</details>
You can read the help page of the diamonds dataset to understand what it contains!<br>
Note: diamonds is a data frame: you can test it with **is.data.frame(diamonds)** (returns TRUE).
**3. How many diamonds have a "Premium" cut ?**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
nrow(diamonds[diamonds$cut == "Premium",])
```
</details>
**4. Select diamonds that have a "Premium" cut AND an "E" color. Save in the new object diams1. How many rows does diams1 have ?**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
diams1 <- diamonds[diamonds$cut == "Premium" & diamonds$color == "E",]
```
</details>
**5- Install and load the package dplyr from the R Console.**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
# Install package
install.packages(pkgs="dplyr")
# Load package
library("dplyr")
```
</details>
**6- Use the function "sample_n" from the dplyr package (check the help page for sample_n) to randomly sample 200 lines of diams1: save in the new diams object.**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
# Subset data frame
diams <- sample_n(tbl=diams1, size=200)
```
</details>
**7- What are the minimum, average and median "carat" of the 200 diamonds in diams ?**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
min(diams$carat)
mean(diams$carat)
median(diams$carat)
summary(diams$carat)
```
</details>
**8- Save diams into a csv file (try the "write.csv" function.)**
<details>
<summary>
*Answer*
</summary>
```{r, eval=F}
# Write a csv file with csv
write.csv(x=diams,
file="diamonds_Premium_E_200.csv",
row.names=FALSE,
quote=FALSE)
```
</details>