Authors: Andrea Somma, Lorenzo Paggetta, Pietro Marelli
The present work showcases different methods to develop a classifier for the Cover Type dataset, in order to achieve an accurate and balanced model for the cover forest type from the cartographic variables in the dataset. The Cover Type dataset contains trees observation from four wilderness areas of the Roosevelt National forest in Colorado. The data is made of cartographic variables only, with no remotely sensed data. It is a rather large dataset, made of 7 forest cover types, more than half a million instances and 54 features, which include data such as elevation, aspect, slope, distance to hydrology, soil type and many others.
Forest Cover Type | |
---|---|
1 | Spruce/Fir |
2 | Lodgepole Pine |
3 | Ponderosa Pine |
4 | Cottonwood/Willow |
5 | Aspen |
6 | Douglas-fir |
7 | Krummholz |
Label Code | Label Type | Data Type |
---|---|---|
1 | Elevation | Integer |
2 | Aspect | Integer |
3 | Slope | Integer |
4 | Horizontal Distance To Hydrology | Integer |
5 | Vertical Distance To Hydrology | Integer |
6 | Horizontal Distance To Roadways | Integer |
7 | Hillshade 9am | Integer |
8 | Hillshade Noon | Integer |
9 | Hillshade 3pm | Integer |
10 | Horizontal Distance To Fire Points | Integer |
11-14 | Wilderness Area | Binary |
15-54 | Soil Type | Binary |
Model | Accuracy [%] | Parameters | Size [MB] | Training Time |
---|---|---|---|---|
Bagging-based - Rescaled | 97 | 3.9M | 24 | 5 min |
DecisionTree-based - Rescaled | 92 | 6k | 3 | 2 min |
DecisionTree-based opt - Rescaled | 90 | 3k | 0.72 | ~20 seconds |
Model | Accuracy [%] | Parameters | Size [kB] | Training Time |
---|---|---|---|---|
NN - Rescaled | 93.3 | 233.9k | 2850 | 9 min |
NN opt - non-quantized - Rescaled | 90.3 | 10.6k | 172 | 4 min |
NN opt - quantized - Rescaled | 90 | 10.6k | 19.5 | 4 min |