Author: Daniel Valencia Email: dafevara[at]gmail[dot]com Date: 2015-08-22
In this file it's described the way to execute the run_analysis.R Script to replicate the steps in order to get the tidy data.
For this work it was only needed one script called run_analysis.R. In
this file the main function is Serket.run
Serket is the code_name
for the project and the name of the Github Repo.
The run_analysis.R script use a DATA_DIR
global variable which stores
the Data Directory where the input data set is ready to be used.
From that main method other auxiliar methods are called. Following the auxiliar method list:
LoadData
: Responsible for validate and load the raw data. It returns a list of raw tables including train and test sets.ClipData
: Responsible for put together the subject, activity and measurement raw data. Subject and Activity data is added to keep consistence and be ready for the extraction fase.ExtractMeanAndStd
: Responsible for the Mean and Standard Deviation measurement extraction. In this step Subject and Activity data is added to keep consistency through the whole processActivityLabels
: Responsible to prepare the activity label before merging with Mean and Std Data.MergeData
: Responsible for merge Mean, Std, Subject and Activity DataAddSelfExplainNames
: Responsible for add Human readable and Self-explain Variable Names to the MergedData. This method calls an other helper method calledGenerateHumanReadableName
which is responsible for generate a easy to read and understand variable name using the keywords from the original variable name.CreateSummarized
: Responsible for create the final tidy data set. This method groups and summarize (using the mean over each measures) the data based on SubjectId, ActivityLabelId and ActivityLabel Variables.
To execute the analysis, there's only needed to suply the DATA_DIR
value which is a directory system path to the UCI HAR Dataset
data.
After that, Please run the run_analysis.R
script with Rscript or
within a R console.
The tidy data will be print as output.
This analysis depends on the following packages:
dplyr