The following project is an internship task assign by Meriskill to train a model to predict the diabetes of the women as this dataset contains only a data of women. The steps invloved in this project are as follows :
- 1)Reading CSV Files
- 2)Checking Null values and Filling Them.(Data Cleaning)
- 3)Visualizations : To know the relationship between independent and dependent variable.
- 4)Splitting the data and scaling it
- 5)Identifying input columns and target columns.
- 6)Logistic regression
- 7)Random Forest
- 8)Gradient Boosting
- 9)Comparing different Model and Selecting Best One.
- 10)Using Best Model on Custom data
- 11)Conclusion
The following are the conclusion :
- 1)Dataset is small that's why model is overfitting in first attempt.
- 2)The Dataset can include data like diet, difficulty in walking,education about diabetes,etc.
- 3)Our Model is giving 81% accuracy that is good in such problems beacuse the bilogical law are not perfect.
- 4)With increase in dataset our model can reach accuracy of 90%.