This is a model for clustering university in america using the variables provided
You are a Data Scientist, you are assigned to do clustering of all universities in America. The hope is that the results of the cluster can be used by the local government to better treat each university.
Variables provided in this model are :
- Apps : Number of applications received
- Accept : Number of applications accepted
- Enroll : Number of new students enrolled
- Top10perc : Pct. new students from top 10% of H.S. class
- Top25perc : Pct. new students from top 25% of H.S. class
- F.Undergrad : Number of fulltime undergraduates
- P.Undergrad : Number of parttime undergraduates
- Outstate : Out-of-state tuition
- Room.Board : Room and board costs
- Books : Estimated book costs
- Personal : Estimated personal spending
- PhD : Pct. of faculty with Ph.D.’s
- Terminal : Pct. of faculty with terminal degree
- S.F.Ratio : Student/faculty ratio
- perc.alumni : Pct. alumni who donate
- Expend : Instructional expenditure per student
- Grad.Rate : Graduation rate
This model contains :
- Data Understanding
- PCA (Principal Commponent Analysis)
- Modelling
- Interpretation
This model generates 4 PCA as it's optimum PCA. This 4 components are enough to describe >75% of the variance in the dataset (you can choose up to 4 if you want). This model generates 3 cluster and you can elaborate the interpretation in the code file. I suggest you to create EDA (Exploratory Data Analysis) for this model to comprehend your insights for this data.