##Overview
This is a repository for project 2 of the Getting and Cleaning Data Coursera course (Johns Hopkins University).
- Download the following file and save it to your computer (eg. your Desktop):
-
Extract it using a decompression program like 7zip. Remember this directory, you'll need it later. During the extraction, a folder named "./UCI HAR Dataset" will be created containing some data which is for information purposes only along with the following 8 essential files without which the script will fail:
-
./test/x_train.txt
-
./test/y_train.txt
-
./test/subject_train.txt
-
./train/x_train.txt
-
./train/subject_train.txt
-
./train/y_train.txt
-
./features.txt
-
./activity_labels.txt
-
Download and save run_analysis.R to your computer and source it using the source() command. (type ?source in the RStudio console if you don't know how to do this)
-
In RStudio at the r console, you can do one of the following to run the script: 1. Use the setwd() command to navigate to the "./UCI HAR Dataset" path you created in step 2 of the setting up section. You can then run analyze() to run the script (For help using the setwd() command, type ?setwd in the RStudio at the r console)
-
Pass the path as a parameter to the analyze() function (eg. analyze("./UCI HAR Dataset"))
-
The script will now run, tellling you what it's doing as it does it. In a nutshell it performs the following actions:
-
Merges the training and the test sets to create one data set.
-
Extracts only the measurements on the mean and standard deviation for each measurement.
-
Uses descriptive activity names to name the activities in the data set
-
Appropriately labels the data set with descriptive variable names.
-
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
-
After successful completion of the script, 2 files will be created:
-
./data-cleaned.txt (Contains the merged data from the training and test sets, as well as the corresponding tidy labels.)
-
./data-averages.txt (Contains all the averaged data from the cleaned data table created in the other file. It should contain 180 rows and 68 columns as there are 30 subjects and 6 activities (30*6), so for each activity and each subject we calculated means 30 * 6 = 180 rows.)