Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature importance #12

Closed
pjk645-zz opened this issue May 17, 2019 · 4 comments
Closed

Feature importance #12

pjk645-zz opened this issue May 17, 2019 · 4 comments
Labels
question Further information is requested

Comments

@pjk645-zz
Copy link

Are the feature scores the same as the feature weights described in http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf . Namely, is it the L2/l2 norm of the feature's or features' function in the $GA^2M$ framework?

@interpret-ml
Copy link
Collaborator

Hi pjk645,

Thanks for the question! Just to make sure we understand right, are you asking about the overall feature importance ranking we assign in the EBM summary?

image

Currently, we calculate this rank as the average of the absolute predicted value of each feature for the training dataset. In other words, per feature, we calculate what each training data point would be scored by that feature. We then take the absolute value of all of these scores, and average the result. Sorting these results gives us the final ranking of features for the model.

Because EBM is an additive model, this method corresponds with which features have the largest impact on predictions in the training set.

There are many ways to calculate overall feature importance, and we are considering including alternative methods (ex: AUROC per feature) in future releases!

@pjk645-zz
Copy link
Author

Firstly, thanks for responding and making this module. The ebm fit works really well right out of the box for several toy and work related data sets.

I'm quite intrigued because of the high performance and the feature importance aspects of the model.

Yes, I am inquiring about the overall importance scores.

From your response, it sounds like you are using the L1/l1 norm instead of the L2/l2 norm of which I was thinking, but I'm not sure.

By L1/l1 norm, I mean the average value of the integral of the absolute value of a function, which simplifies to the average absolute value at discrete points for the discrete case.

By L2/l2 norm I mean the square root of the average value of the integral of a function squared, which simplifies to the square root of the average of the square values of the function

In http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf, the authors talk using the L2/l2 of each feature component as the "weight" and then use this to rank each feature's importance to the model. Am I correct in interpreting that your are pretty much doing the same but with the L1/l1 instead of the L2/l2?

@interpret-ml
Copy link
Collaborator

By L1/l1 norm, I mean the average value of the integral of the absolute value of a function, which simplifies to the average absolute value at discrete points for the discrete case.

This is correct, except we compute a weighted average absolute value across the function (weighted by the density of the training dataset). So if 90% of the training data took a value of "0" for a feature, and 10% took the value "1", the value of "0" has 9x the weight of the value of "1" before we compute the average absolute value.

The main idea is to prevent functions that take an extreme value in sparse regions from getting high scores. This is the key difference between the "weights" described in the paper, and our current methodology (alongside the L1/L2 difference).

Of course, if you want to appropriately highlight different cases, there are other choices of feature importance (like the weights described in the paper, average ROC per feature on a validation set, etc.), so we plan to make more options available. Thanks for the insightful question!

@pjk645-zz
Copy link
Author

pjk645-zz commented May 19, 2019

Great, thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants