A collection of Machine Learning datasets for health care and beyond.
Here, I will use the package remotes to install the package:
# install.packages("remotes") # if not already installed
remotes::install_github("https://github.com/StatsGary/MLDataR")
library(MLDataR)
To install from CRAN, use the below command:
install.packages("MLDataR")
To load the package from CRAN, use the following:
library(MLDataR)
The package currently has three example datasets, and more are being added every week. The first three datasets contained in the package are:
- Counter Strike Global Offensive - supervised machine learning regression and classification data set to predict score or match outcome.
- Diabetes disease prediction - supervised machine learning classification dataset to enable the prediction of diabetic patients.
- Diabetes onset prediction - supervised machine learning regression dataset to enable prediction of the age at which a pre-diabetic will develop diabetes
- Failing Care Home classification - classification supervised machine learning dataset to predict a failing care home by selected Datix incidents. UK Datix service.
- Heart disease prediction - supervised machine learning classification dataset to enable the prediction of heart disease using a number of key outcome features. Anonymised from the British Heart Foundation example records.
- Long stayers prediction - supervised machine learning classification dataset to enable the prediction of a patient staying in hospital longer than 7 days. Extracted from stranded patients extract and anonymised for training and research purposes. Nottingham University Hospitals.
- Stroke Classification - supervised machine learning classification dataset to enable the prediction of a stroke in an unseen patient, using past observations in the training set.
- Thyroid disease prediction - supervised machine learning classification dataset to allow for the prediction of thyroid disease utilising historic patient records. Garvin Institute - see references in markdown files supporting package.
More datasets are being added, so look out for the next version of this package.
It has been fun putting this package together and I hope you find it useful. If you find any issues using the package, please raise a git hub ticket and I will address it as soon as possible. Thanks and I hope you enjoy using it.