Statistical analysis and linear model to predict the mpg (miles-per-gallon) of MechaCar prototypes (to identify ideal vehicle performance) using R, T-Test.
The purpose of the MechaCAr project is to review the production data for insights that may help the manufacturing teams:
- Predict MPG using Linear Regression. Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes.
- Summary Statistics on Suspension Coils. Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots
- T-Test on Suspension Coils by runing T-Tests to determine if the manufacturing lots are statistically different from the mean population
- Design a Study Comparing the MechaCar to the Competition. Design a statistical study to compare vehicle performance of the MechaCar vehicles against vehicles from other manufacturers.
The analysis is created using next software: R-Studio 2.14.1, R version 4.1.1 -- "Kick Things".
The RScript result of MechaCar Statistical Analysis can be found in the MechaCarChallenge file
According to our results the variables: Intercept (5.08e-08), vehicle_length (2.60e-12), ground_clearance (5.21e-08) provide a non-random amount of variance to the mpg values in the dataset because of their pr(>|t|)< 0.05. So, they have significant impact on the mpg values.
In addition, the pr(>|t|) of our linear regression analysis is 5.08 x 10-8, which is much smaller than our assumed significance level of 0.05%. Therefore, we can state that there is sufficient evidence to reject our null hypothesis, which means that the slope of our linear model is not zero.
According to the calculated r-squared value as 0.7149 which means that roughly 71%. So, the linear model can be predicted.
- The total summary table shows below:
- The lot_summary dataframe is following:
Additionally, the design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. So, as the variance is around 62 that means the current manufacturing data meet this design specification for all manufacturing lots in total.
But the variance of lot 3 is about 170 which means it does not meet the design specification.
T-tests allows to determine if all manufacturing lots and each lot individually are statistically different from the population mean of 1,500 PSI.
According to the calculated p-value of 1 higher than 0.05 for all manufacturing lot. So, it is not right to reject the null hypothesis. The mean is 1498 which is close from the population mean.
For the manufacturing lot1, p-value of 1.58 x 10-11 is lower than 0.05. So, it is right to reject the null hypothesis. The mean is 1500 which is the same as the population mean.
For the manufacturing lot2, p-value of 0.006 is lower than 0.05. So, it is right to reject the null hypothesis. The mean is 1500.02 which is close from the population mean.
For the manufacturing lot3, p-value of 0.1589 is higher than 0.05. So, it is not right to reject the null hypothesis. The mean is 1496.14 which is close from the population mean.
A statistical study to compare performance of the MechaCar vehicles against performance of vehicles from other manufacturers is bellow. Consumer would be interest in several metrics as cost, city or highway fuel efficiency, horse power, maintenance cost, or safety rating.
Null hypothesis as there is no difference between the metrics of MechaCar vs competition. Alternate hypothesis as there is a difference between the metrics of MechaCar vs competition.
To test the hypothesis should use T-test as if the p-value is less than the significance level, then the null hypothesis is rejected.
To run the statistical test needs to use data as cost, city or highway fuel efficiency, horse power, maintenance cost, safety rating.