Author Contributions Statement

Bruce:

I completed EDA for the binary diabetes dataset. I did some data cleaning work, dropped duplicated data, and created several visualizations to help the audience get a better understanding of the dataset. I worked on the section of generating different classification models on the binary dataset. I compared their accuracies and conducted optimization on the random forest model and reorganized our repo structure to separate data, figures, notebooks, etc.

Duy:

I started main.ipynb and did ols regression on variables of interest: diabetes. I explained the OLS regression and its significance of the dataset and model.

Sam Tan:

I made our work visable by publishing it online as a jupyterbook, with the help of a github workflow; Compose the README file with detailed descriptions of the project and the structure of the repository; Create the package; Improve Makefile commands based on Donghoon's work so the environment could be installed in one line; Update the Environment.yml with the correct verison for numpy.

Donghoon Shin:

I made our codebase reproducible by making environment.yml with makefile that creates conda environment and ipykernal. Also, I added scientific analysis of which features predict diabetes with logistic regression analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contribution_statement.md

contribution_statement.md

Author Contributions Statement

Bruce:

Duy:

Sam Tan:

Donghoon Shin:

Files

contribution_statement.md

Latest commit

History

contribution_statement.md

File metadata and controls

Author Contributions Statement

Bruce:

Duy:

Sam Tan:

Donghoon Shin: