The Iris Flower dataset is a built-in dataset in Scikit learn and contains data on Sepal Length, Sepal Width, Petal Length and Petal Width for 3 different types of irises’ (Setosa, Versicolour, and Virginica). In this project 5 different machine learning methods (Decision Tree, Support Vector Machine, Random Forest, Naive Bayes and K-nearest neighbour) are compared using Scikit-learn built in methods. In addition, Gaussian Mixture Model and K-means algorithm are implemented from scratch. This project was created as a homework assignment in a course in Machine Learning at National University of Singapore (NUS).
In ModelComparison.py the built-in scikit-learn libraries for five different machine learning methods are implemented and compared, these are:
- Decision Tree
- Support Vector Machine
- Random Forest
- Naive Bayes
- K-nearest neighbour
In addition to the five pre-defined algorithms in Scikit-learn, two of them are built from scratch, these are:
Gaussian Mixture Model
K-means clustering