Skip to content

Latest commit

 

History

History
27 lines (16 loc) · 1.59 KB

README.md

File metadata and controls

27 lines (16 loc) · 1.59 KB

EDA_R_Packages

EDA is a must to step in the data science workflow. Working on data, wrangling & transforming them is time consuming, and it determine the success degree of the next steps (pre preocessing, modelling, communicating outputs & decision making). This repo will show you how to perform EDA in R using the tidyverse ecosystem, and will introduce a comparative approach between the main packages in R whcich could let you perform automated EDA & generating automated EDA html or pdf reports, ready to be communicated.

Scope of work

We are going to compare here the most known (in my awareness) R packages dedicated to EDA & automated EDA.

Here is a non exhaustive list: The tidyverse: the most known & revolutionary packages' ecosystem (collection of packages) in R. Covering all the DS workflow. ## See https://www.tidyverse.org/ Note that the above packages have dependencies with the tidyverse's package as dplyr, ggplot2, etc.

SmartEDA # see https://github.com/daya6489/SmartEDA

dlookr # see https://github.com/choonghyunryu/dlookr

DataExplorer # https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html

Hmisc # see https://hbiostat.org/R/Hmisc/

exploreR # see https://cran.r-project.org/web/packages/exploreR/index.html

RtutoR # see https://cran.r-project.org/web/packages/RtutoR/index.html

summarytools # see https://cran.r-project.org/web/packages/summarytools/vignettes/Introduction.html

Packages installation

To install successfuly SmartEDA, dlookr, etc, you must install Rtools version 4.0 from https://cran.r-project.org/bin/windows/Rtools/