-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature importance #12
Comments
Hi pjk645, Thanks for the question! Just to make sure we understand right, are you asking about the overall feature importance ranking we assign in the EBM summary? Currently, we calculate this rank as the average of the absolute predicted value of each feature for the training dataset. In other words, per feature, we calculate what each training data point would be scored by that feature. We then take the absolute value of all of these scores, and average the result. Sorting these results gives us the final ranking of features for the model. Because EBM is an additive model, this method corresponds with which features have the largest impact on predictions in the training set. There are many ways to calculate overall feature importance, and we are considering including alternative methods (ex: AUROC per feature) in future releases! |
Firstly, thanks for responding and making this module. The ebm fit works really well right out of the box for several toy and work related data sets. I'm quite intrigued because of the high performance and the feature importance aspects of the model. Yes, I am inquiring about the overall importance scores. From your response, it sounds like you are using the L1/l1 norm instead of the L2/l2 norm of which I was thinking, but I'm not sure. By L1/l1 norm, I mean the average value of the integral of the absolute value of a function, which simplifies to the average absolute value at discrete points for the discrete case. By L2/l2 norm I mean the square root of the average value of the integral of a function squared, which simplifies to the square root of the average of the square values of the function In http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf, the authors talk using the L2/l2 of each feature component as the "weight" and then use this to rank each feature's importance to the model. Am I correct in interpreting that your are pretty much doing the same but with the L1/l1 instead of the L2/l2? |
This is correct, except we compute a weighted average absolute value across the function (weighted by the density of the training dataset). So if 90% of the training data took a value of "0" for a feature, and 10% took the value "1", the value of "0" has 9x the weight of the value of "1" before we compute the average absolute value. The main idea is to prevent functions that take an extreme value in sparse regions from getting high scores. This is the key difference between the "weights" described in the paper, and our current methodology (alongside the L1/L2 difference). Of course, if you want to appropriately highlight different cases, there are other choices of feature importance (like the weights described in the paper, average ROC per feature on a validation set, etc.), so we plan to make more options available. Thanks for the insightful question! |
Great, thanks for the feedback. |
Are the feature scores the same as the feature weights described in http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf . Namely, is it the L2/l2 norm of the feature's or features' function in the$GA^2M$ framework?
The text was updated successfully, but these errors were encountered: