EDA is a must to step in the data science workflow. Working on data, wrangling & transforming them is time consuming, and it determine the success degree of the next steps (pre preocessing, modelling, communicating outputs & decision making). This repo will show you how to perform EDA in R using the tidyverse ecosystem, and will introduce a comparative approach between the main packages in R whcich could let you perform automated EDA & generating automated EDA html or pdf reports, ready to be communicated.
We are going to compare here the most known (in my awareness) R packages dedicated to EDA & automated EDA.
Here is a non exhaustive list: The tidyverse: the most known & revolutionary packages' ecosystem (collection of packages) in R. Covering all the DS workflow. ## See https://www.tidyverse.org/ Note that the above packages have dependencies with the tidyverse's package as dplyr, ggplot2, etc.
SmartEDA # see https://github.com/daya6489/SmartEDA
dlookr # see https://github.com/choonghyunryu/dlookr
DataExplorer # https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html
Hmisc # see https://hbiostat.org/R/Hmisc/
exploreR # see https://cran.r-project.org/web/packages/exploreR/index.html
RtutoR # see https://cran.r-project.org/web/packages/RtutoR/index.html
summarytools # see https://cran.r-project.org/web/packages/summarytools/vignettes/Introduction.html
To install successfuly SmartEDA, dlookr, etc, you must install Rtools version 4.0 from https://cran.r-project.org/bin/windows/Rtools/