GitHub - ScPoEcon/moderndive: R package for tidyverse-friendly introductory linear regression

moderndive R Package

An R package of datasets and wrapper functions for tidyverse-friendly introductory linear regression used in

ModernDive: An Introduction to Statistical and Data Sciences via R available at ModernDive.com
DataCamp’s Modeling with Data in the Tidyverse

Installation

Get the released version from CRAN:

install.packages("moderndive")

Or the development version from GitHub:

# If you haven't installed remotes yet, do so:
# install.packages("remotes")
remotes::install_github("moderndive/moderndive")

Demo

Let’s fit a simple linear regression of teaching score (as evaluated by students) over instructor age for 463 instructors at the UT Austin:

library(moderndive)
score_model <- lm(score ~ age, data = evals)

Among the many useful features of the moderndive package outlined in our essay “Why should you use the moderndive package for intro linear regression?” we highlight three functions in particular:

1. Get regression tables

Get a tidy regression table with confidence intervals:

get_regression_table(score_model)

## # A tibble: 2 x 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept    4.46      0.127     35.2    0        4.21     4.71 
## 2 age         -0.006     0.003     -2.31   0.021   -0.011   -0.001

2. Get fitted/predicted values and residuals

Get information on each point/observation in your regression, including fitted/predicted values & residuals, organized in a single data frame with intuitive variable names:

get_regression_points(score_model)

## # A tibble: 463 x 5
##       ID score   age score_hat residual
##    <int> <dbl> <int>     <dbl>    <dbl>
##  1     1   4.7    36      4.25    0.452
##  2     2   4.1    36      4.25   -0.148
##  3     3   3.9    36      4.25   -0.348
##  4     4   4.8    36      4.25    0.552
##  5     5   4.6    59      4.11    0.488
##  6     6   4.3    59      4.11    0.188
##  7     7   2.8    59      4.11   -1.31 
##  8     8   4.1    51      4.16   -0.059
##  9     9   3.4    51      4.16   -0.759
## 10    10   4.5    40      4.22    0.276
## # ... with 453 more rows

3. Get regression fit summaries

Get all the scalar summaries of a regression fit included in summary(score_model) along with the mean-squared error and root mean-squared error:

get_regression_summaries(score_model)

## # A tibble: 1 x 8
##   r_squared adj_r_squared   mse  rmse sigma statistic p_value    df
##       <dbl>         <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl> <dbl>
## 1     0.011         0.009 0.292 0.540 0.541      5.34   0.021     2

Other features

1. Print markdown friendly tables

Want to output cleanly formatted tables in an R Markdown document? Just add print = TRUE to any of the three get_regression_ functions.

get_regression_table(score_model, print = TRUE)

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	4.462	0.127	35.195	0.000	4.213	4.711
age	-0.006	0.003	-2.311	0.021	-0.011	-0.001

2. Predictions on new data

Want to apply your fitted model on new data to make predictions? No problem! Include a newdata data frame argument to get_regression_points().

For example, the Kaggle.com practice competition House Prices: Advanced Regression Techniques requires you to fit/train a model to the provided train.csv training set to make predictions of house prices in the provided test.csv test set. The following code performs these steps and outputs the predictions in submission.csv:

library(tidyverse)
library(moderndive)

# Load in training and test set
train <- read_csv("https://github.com/moderndive/moderndive/raw/master/vignettes/train.csv")
test <- read_csv("https://github.com/moderndive/moderndive/raw/master/vignettes/test.csv")

# Fit model
house_model <- lm(SalePrice ~ YrSold, data = train)

# Make and submit predictions
submission <- get_regression_points(house_model, newdata = test, ID = "Id") %>% 
  select(Id, SalePrice = SalePrice_hat)
write_csv(submission, "submission.csv")

The resulting submission.csv is formatted such that it can be submitted on Kaggle, resulting in a “root mean squared logarithmic error” leaderboard score of 0.42918.

The Details

The three get_regression functions are wrappers of functions from the broom package for converting statistical analysis objects into tidy tibbles along with a few added tweaks:

get_regression_table() is a wrapper for broom::tidy()
get_regression_points() is a wrapper for broom::augment()
get_regression_summaries is a wrapper for broom::glance()

Why did we create these wrappers?

The broom package function names tidy(), augment(), and glance() don’t mean anything to intro stats students, where as the moderndive package function names get_regression_table(), get_regression_points(), and get_regression_summaries() are more intuitive.
The default column/variable names in the outputs of the above 3 functions are a little daunting for intro stats students to interpret. We cut out some of them and renamed many of them with more intuitive names. For example, compare the outputs of the get_regression_points() wrapper function and the parent broom::augment() function.

get_regression_points(score_model)

## # A tibble: 463 x 5
##       ID score   age score_hat residual
##    <int> <dbl> <int>     <dbl>    <dbl>
##  1     1   4.7    36      4.25    0.452
##  2     2   4.1    36      4.25   -0.148
##  3     3   3.9    36      4.25   -0.348
##  4     4   4.8    36      4.25    0.552
##  5     5   4.6    59      4.11    0.488
##  6     6   4.3    59      4.11    0.188
##  7     7   2.8    59      4.11   -1.31 
##  8     8   4.1    51      4.16   -0.059
##  9     9   3.4    51      4.16   -0.759
## 10    10   4.5    40      4.22    0.276
## # ... with 453 more rows

library(broom)
broom::augment(score_model)

## # A tibble: 463 x 9
##    score   age .fitted .se.fit  .resid    .hat .sigma   .cooksd .std.resid
##  * <dbl> <int>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>     <dbl>      <dbl>
##  1   4.7    36    4.25  0.0405  0.452  0.00560  0.542 0.00197        0.837
##  2   4.1    36    4.25  0.0405 -0.148  0.00560  0.542 0.000212      -0.274
##  3   3.9    36    4.25  0.0405 -0.348  0.00560  0.542 0.00117       -0.645
##  4   4.8    36    4.25  0.0405  0.552  0.00560  0.541 0.00294        1.02 
##  5   4.6    59    4.11  0.0371  0.488  0.00471  0.541 0.00193        0.904
##  6   4.3    59    4.11  0.0371  0.188  0.00471  0.542 0.000288       0.349
##  7   2.8    59    4.11  0.0371 -1.31   0.00471  0.538 0.0139        -2.43 
##  8   4.1    51    4.16  0.0261 -0.0591 0.00232  0.542 0.0000139     -0.109
##  9   3.4    51    4.16  0.0261 -0.759  0.00232  0.541 0.00229       -1.40 
## 10   4.5    40    4.22  0.0331  0.276  0.00374  0.542 0.000488       0.510
## # ... with 453 more rows

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
R		R
data-raw		data-raw
data		data
docs		docs
images		images
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CONDUCT.md		CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
moderndive.Rproj		moderndive.Rproj
scratchpad.R		scratchpad.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

moderndive R Package

Installation

Demo

1. Get regression tables

2. Get fitted/predicted values and residuals

3. Get regression fit summaries

Other features

1. Print markdown friendly tables

2. Predictions on new data

The Details

About

Releases

Packages

Languages

License

ScPoEcon/moderndive

Folders and files

Latest commit

History

Repository files navigation

moderndive R Package

Installation

Demo

1. Get regression tables

2. Get fitted/predicted values and residuals

3. Get regression fit summaries

Other features

1. Print markdown friendly tables

2. Predictions on new data

The Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages