This project aims to explore and build machine learning models for predicting heart disease conditions based on various health-related features. The dataset used in this project is the Cleveland Heart Disease dataset, obtained from the UCI Machine Learning Repository.
Heart disease is a prevalent and severe health issue worldwide. This project explores various techniques to analyze and predict heart disease conditions using machine learning models. The primary objective is to develop models that can accurately classify individuals as having or not having heart disease based on their health features. The project involves data preprocessing, exploratory data analysis, feature engineering, and model building. Various classification algorithms, such as logistic regression, linear discriminant analysis, k-nearest neighbors, decision trees, naive Bayes, random forests, and support vector machines, are implemented and evaluated.
To run this project, you'll need the following dependencies:
Python (version 3.6 or later)
pandas
numpy
matplotlib
seaborn
scikit-learn
Logistic Regression
K-Nearest Neigbour
Gaussian Naive Bayes
Random forest and Decision Tree model
Linear Discrement Analysis
You can install the required packages using pip:
pip install pandas numpy matplotlib seaborn scikit-learn
The project repository contains the following files:
Heart_Disease_Prediction.ipynb:
This Jupyter Notebook contains the main code for the project, including data preprocessing, exploratory data analysis, feature engineering, model building, and evaluation.heart_cleveland_upload.csv:
The Cleveland Heart Disease dataset used in the project.
- Clone the project repository to your local machine.
- Open the Heart_Disease_Prediction.ipynb file in a Jupyter Notebook environment or a compatible Python IDE.
- Follow the instructions and code cells in the notebook to preprocess the data, perform exploratory data analysis, build and evaluate machine learning models.
- Modify the code as needed to experiment with different techniques or models.
The project evaluates the performance of various machine learning models using cross-validation techniques. The results, including accuracy scores, confusion matrices, and classification reports, are provided within the notebook.