Skip to content
nickgillian edited this page Aug 14, 2016 · 1 revision

#Bootstrap Aggregating (BAG)

##Description The BAG classifier implements bootstrap aggregating (bagging), a machine learning ensemble meta-algorithm that is designed to improve the stability and accuracy of other machine learning algorithms.

Bagging works by creating an ensemble of classifiers, each of which is trained with a random subsample of the main training dataset, and then giving each classifier in the ensemble a weighted vote for which class it thinks the new datum belongs to. These votes are then combined together to form the overall prediction. The bagging algorithm assigns each class an equal weight, however you can update these weights in real-time if you have access to other information that might inform the vote of each classifier in the ensemble. Bagging can help to improve the performance of other machine learning algorithms and can also help reduce variance and avoid overfitting. Although bagging is usually applied to decision tree methods, the BAG class can be used with any type of GRT classifier (you can even use multiple classifiers at the same time).

The BAG algorithm is part of the GRT classification modules.

##Advantages The BAG classifier is a powerful meta-algorithm that can be used to classify complex problems. The main advantage of the BAG algorithm is that you can use it to create an ensemble classifier (a classifier that is made up of several classifiers, who each get a vote as to what class a datum might belong to). Ensemble classifiers use multiple models to obtain better predictive performance than could be obtained from any of the constituent models.

##Disadvantages The main limitation of the BAG algorithm is that it can be slower to make a real-time prediction than other classifiers (as each classifier in the BAG ensemble needs to make a prediction). If you require a very fast ensemble classifier then you should either reduce the number classifiers in your ensemble, or make sure that the classifiers you add to your ensemble are fast (for example MinDist is a very fast classifier, whereas KNN is not).

##Training Data Format You should use the ClassificationData data structure to train the BAG classifier.

##Example Code BAG Example