You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 10, 2020. It is now read-only.
Many of our features are not very useful. We should include a first step of feature selection before passing the features matrix to the classifier. This could be something simple, e.g. a variance threshold, or something more complex. See a reference here in scikit-learn for how we can do this (no wheel invention necessary).
The text was updated successfully, but these errors were encountered:
The goal of this is to reduce computational time, not increase accuracy, correct? I would assume removing any features would have a negative effect on accuracy (although I assume for features with extremely low variance--to use one metric of usefulness--this would be negligible).
You're certainly right that it will make things faster, but feature selection is primarily for improving our classifier's results (on our metrics: AUC, etc.). If we add a lot of not-very-useful features, then we are adding a lot of noise which makes the learning problem significantly harder. It means we'll need more data to explore a much larger feature space and we also run the risk of the classifier picking up on noise and overfitting.
Many of our features are not very useful. We should include a first step of feature selection before passing the features matrix to the classifier. This could be something simple, e.g. a variance threshold, or something more complex. See a reference here in scikit-learn for how we can do this (no wheel invention necessary).
The text was updated successfully, but these errors were encountered: