Feature Selection #63

redshiftzero · 2016-10-12T00:33:47Z

Many of our features are not very useful. We should include a first step of feature selection before passing the features matrix to the classifier. This could be something simple, e.g. a variance threshold, or something more complex. See a reference here in scikit-learn for how we can do this (no wheel invention necessary).

psivesely · 2016-10-24T19:11:37Z

The goal of this is to reduce computational time, not increase accuracy, correct? I would assume removing any features would have a negative effect on accuracy (although I assume for features with extremely low variance--to use one metric of usefulness--this would be negligible).

redshiftzero · 2016-10-24T19:21:16Z

You're certainly right that it will make things faster, but feature selection is primarily for improving our classifier's results (on our metrics: AUC, etc.). If we add a lot of not-very-useful features, then we are adding a lot of noise which makes the learning problem significantly harder. It means we'll need more data to explore a much larger feature space and we also run the risk of the classifier picking up on noise and overfitting.

redshiftzero added the machine learning label Oct 12, 2016

psivesely mentioned this issue Oct 29, 2016

Add initial machine learning pipeline #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Selection #63

Feature Selection #63

redshiftzero commented Oct 12, 2016

psivesely commented Oct 24, 2016

redshiftzero commented Oct 24, 2016

Feature Selection #63

Feature Selection #63

Comments

redshiftzero commented Oct 12, 2016

psivesely commented Oct 24, 2016

redshiftzero commented Oct 24, 2016