GettingAndCleaningDataAssignement

This git contains the required documents acc. to the Coursera Course Getting and Cleaning Data Assignement

You may find below a description of run_analysis.R script The R script contains detailed comments inside a code, but a short description is included here as well in order to fulfill the requirements of the Assignement

run_analysis.R readme descritpion

The script is divided into 4 main parts, which are

Setting the working directory
Loading the tables
Manipulating the data
Creating a new independent tidy data of averages

Each part in finer detail:

Setting the working directory

The script automatically sets the working directory to "D:/Balazs/Data Analysis/Assignement". I have saved the SAMSUNG data in this directory. This directory contains the following files and folders: "test", "train", "activity_labels.txt", "features.txt", "features_info.txt", "README.txt" If you wish to run my script, please ensure that you change this command to the directory where the above files and folders are located.

Loading the tables

General Tables The script reads in with the help of read.delim() function the "activity_labes.txt" and the "features.txt" The activity_labels table is set to factor variables and some basic format is concluded / numbers are removed

Training Dataset The script reads in with the help of read.delim() and read.table() the "y_train.txt", "x_train.txt" and "subject_train.txt" files

Testing Dataset The script reads in with the help of read.delim() and read.table() the "y_test.txt", "x_test.txt" and "subject_test.txt" files

All the data contained in the directory "Inertial Signals" are omited and not read in. The assignement clearly states to only use Mean and Standard Deviation data later on and these files do not contain similar variables

After all the data is read in, the script starts to merge them into one. The script processes the subject data as numeric variable and the activity data as factor variable. All other variables are numeric. First the activity and subject vectors are bind by row (merging the training and the testing data). The resulting dataframes are given names. ("Activity" and "Subject) In result we acquire "subject_final" and "activity_final" data frames. In a similar manner we rbind() the "x_train" and "x_test" data to get "x_final". We ensure the coloumns are named acc. to the features. The "subject_final", "activity_final" and "x_final" data frames are coloumn binded with cbind() resulting the "NewDataset" containing the merged (test-train) data.

Manipulating the data

Searching through the variable names for "mean", "Mean" and "std" in order to select the variables containing information about various means and standard deviations. All other variables are dropped expect the "Subject" and "Activity" variables Afterwards the script ensures that descriptive names are generated from the existing one for each variable

Creating a new, independent tidy Data of Averages

The "AveragesData" data frame is created by assigning the "NewDataSet" to it. This new data frame is manipulated using the aggregate() function grouping the variables by Subject and by Activities and applying the mean() function for every variable in the data frame. Some additional formating is applied to ensure the names are descriptive and the Data frame does not contain any unnecessary or duplicate information, thus keeping the data frame tidy. The script uses then the write.table() function in order to create a "Submit.txt" with the new tidy data set.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
features_info_submit.txt		features_info_submit.txt
features_submit.txt		features_submit.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GettingAndCleaningDataAssignement

run_analysis.R readme descritpion

About

Releases

Packages

Languages

balintba/GettingAndCleaningDataAssignement

Folders and files

Latest commit

History

Repository files navigation

GettingAndCleaningDataAssignement

run_analysis.R readme descritpion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages