About one-third of patients with diabetes do not know that they have diabetes according to the findings published by many diabetes institutes around the world. Detecting and treating diabetes patients at early stages is critical in order to keep them healthy and to ensure their quality of life is not compromised. Early detection will also help to mitigate the risk of serious complications like heart disease & stroke, blindness, limb amputations, and kidney failures as a result of diabetes. The data set consists of signs and symptoms of 516 newly diabetic or would be diabetic patients, who presented at Sylhet Diabetes Hospital in Sylhet, Bangladesh. The data had been collected using the direct questionnaires method at the hospital under the supervisor of Doctors. The Source for the data set is the UCI Machine Learning Repository at, https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset. The data set has 16 descriptive features and one target feature. This study intends to build a logistic regression model to predict the likelihood of having diabetes using common signs and symptoms presented by patients. A successful model will enable early detection of diabetes through signs and symptoms shown by possible patients. This study consists of two phases: 1) Phase I - preprocess and explore the data set in order to make it ready to consume for model development. 2) Phase II - build a logistic regression model to predict the likelihood of having diabetes based on signs and symptoms. The Phase I part has already been completed under previous work/submission and this report intends to cover the work carried out for Phase II. All the activities have been performed in the R package and the report has been compiled using R-Markdown.
https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.
Descriptive_features.csv
data-set: Phase1_Data.csv
R Code: Phase1_Code.RMD
Report: Phase1_Report.pdf
Building a logistic regression model to predict the likelihood of having diabetes based on signs and symptoms
data-set: Phase2_Data.csv
R Code: Phase2_Code.RMD
Report: Phase2_Report.pdf