Skip to content

Implemented a Decision Tree from Scratch using binary univariate split, entropy, and information gain. Used Gini index and Pruning for performance improvement.

Notifications You must be signed in to change notification settings

AyanPahari/Decision-Tree-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Decision Tree from Scratch

Implemented a Decision Tree from Scratch using binary univariate split, entropy and information gain.

  • Evaluated it's performance using 10 folds cross validation

  • Improvement strategies:

      - Use Gini index instead of entropy
      - Prune the tree after splitting for better generalization
    

Dataset

Wine-Dataset

Accuracies

  • Accuracy of Initial Implementation using Entropy and Information Gain

      1. K-Fold(K=10) Cross Validation without Random Sampling Accuracy: 74.90%
      2. K-Fold(K=10) Cross Validation with Random Sampling Accuracy: 84.06%
    

It can be seen by the accuracies that Cross Validation with random sampling works much better than without random sampling, as it randomly splits the data into training and testing this will ensure that the model will not be affected by any inbuilt bias in the dataset.

  • Accuracy of Improved Implementation

      1. Gini Index instead of Entropy(K-Fold Cross Validation with Random Sampling) Accuracy: 83.32%
      2. Pruning the Tree to a certain depth and using min sampling threshold(K-Fold Cross Validation with Random Sampling) Accuracy : 84.22%
    

Using Gini index instead of Entropy the accuracy doesn’t seem to change that much, but the structure of the tree and the split values definitely have changed.

After Pruning the tree the accuracy came out to be slightly better than the previous approach but still not a huge improvement.

Authors

About

Implemented a Decision Tree from Scratch using binary univariate split, entropy, and information gain. Used Gini index and Pruning for performance improvement.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published