forked from moderndive/moderndive
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
135 lines (85 loc) · 5.57 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## moderndive R Package <img src="https://github.com/moderndive/moderndive/blob/master/images/hex_blue_text.png?raw=true" align="right" width=125 />
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/moderndive)](https://cran.r-project.org/package=moderndive) [![Travis-CI Build Status](https://travis-ci.org/moderndive/moderndive.svg?branch=master)](https://travis-ci.org/moderndive/moderndive) [![Coverage Status](https://img.shields.io/codecov/c/github/moderndive/moderndive/master.svg)](https://codecov.io/github/moderndive/moderndive?branch=master)[![CRAN RStudio mirror downloads](http://cranlogs.r-pkg.org/badges/moderndive)](http://www.r-pkg.org/pkg/moderndive)
An R package of datasets and wrapper functions for [tidyverse](https://www.tidyverse.org/)-friendly introductory linear regression used in
- ModernDive: An Introduction to Statistical and Data Sciences via R available at [ModernDive.com](https://moderndive.com/)
- DataCamp's [Modeling with Data in the Tidyverse](https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse)
## Installation
Get the released version from CRAN:
```{r, eval=FALSE}
install.packages("moderndive")
```
Or the development version from GitHub:
```{r, eval=FALSE}
# If you haven't installed remotes yet, do so:
# install.packages("remotes")
remotes::install_github("moderndive/moderndive")
```
## Demo
Let's fit a simple linear regression of teaching `score` (as evaluated by students) over instructor age for 463 instructors at the UT Austin:
```{r}
library(moderndive)
score_model <- lm(score ~ age, data = evals)
```
Among the many useful features of the `moderndive` package outlined in our essay ["Why should you use the moderndive package for intro linear regression?"](https://moderndive.github.io/moderndive/articles/why-moderndive.html) we highlight three functions in particular:
#### 1. Get regression tables
Get a tidy regression table **with** confidence intervals:
```{r}
get_regression_table(score_model)
```
#### 2. Get fitted/predicted values and residuals
Get information on each point/observation in your regression, including fitted/predicted values & residuals, organized in a single data frame with intuitive variable names:
```{r}
get_regression_points(score_model)
```
#### 3. Get regression fit summaries
Get all the scalar summaries of a regression fit included in `summary(score_model)` along with the mean-squared error and root mean-squared error:
```{r}
get_regression_summaries(score_model)
```
## Other features
#### 1. Print markdown friendly tables
Want to output cleanly formatted tables in an R Markdown document? Just add `print = TRUE` to any of the three `get_regression_*()` functions.
```{r}
get_regression_table(score_model, print = TRUE)
```
#### 2. Predictions on new data
Want to apply your fitted model on new data to make predictions? No problem! Include a `newdata` data frame argument to `get_regression_points()`.
For example, the Kaggle.com practice competition [House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques){target="_blank"} requires you to fit/train a model to the provided `train.csv` training set to make predictions of house prices in the provided `test.csv` test set. The following code performs these steps and outputs the predictions in `submission.csv`:
```{r, eval=FALSE}
library(tidyverse)
library(moderndive)
# Load in training and test set
train <- read_csv("https://github.com/moderndive/moderndive/raw/master/vignettes/train.csv")
test <- read_csv("https://github.com/moderndive/moderndive/raw/master/vignettes/test.csv")
# Fit model
house_model <- lm(SalePrice ~ YrSold, data = train)
# Make and submit predictions
submission <- get_regression_points(house_model, newdata = test, ID = "Id") %>%
select(Id, SalePrice = SalePrice_hat)
write_csv(submission, "submission.csv")
```
The resulting `submission.csv` is formatted such that it can be submitted on Kaggle, resulting in a "root mean squared logarithmic error" leaderboard score of 0.42918.
```{r echo=FALSE}
knitr::include_graphics("https://github.com/moderndive/moderndive/raw/master/vignettes/leaderboard_orig.png")
```
## The Details
The three `get_regression` functions are wrappers of functions from the [`broom`](https://CRAN.R-project.org/package=broom/vignettes/broom.html){target="_blank"} package for converting statistical analysis objects into tidy tibbles along with a few added tweaks:
1. `get_regression_table()` is a wrapper for `broom::tidy()`
1. `get_regression_points()` is a wrapper for `broom::augment()`
1. `get_regression_summaries()` is a wrapper for `broom::glance()`
Why did we create these wrappers?
* The `broom` package function names `tidy()`, `augment()`, and `glance()` don't mean anything to intro stats students, where as the `moderndive` package function names `get_regression_table()`, `get_regression_points()`, and `get_regression_summaries()` are more intuitive.
* The default column/variable names in the outputs of the above 3 functions are a little daunting for intro stats students to interpret. We cut out some of them and renamed many of them with more intuitive names. For example, compare the outputs of the `get_regression_points()` wrapper function and the parent `broom::augment()` function.
```{r}
get_regression_points(score_model)
library(broom)
broom::augment(score_model)
```
***
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.