Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SHAP predict values as new output file #982

Merged

Conversation

mattahrens
Copy link
Collaborator

New output file shap_values.csv in xgboost_predictions folder will contain feature importance for prediction set.

Signed-off-by: mattahrens <matthewahrens@gmail.com>
@mattahrens mattahrens requested a review from leewyang May 1, 2024 13:30
@mattahrens mattahrens self-assigned this May 1, 2024
@mattahrens mattahrens added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label May 1, 2024
Signed-off-by: mattahrens <matthewahrens@gmail.com>
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mattahrens
The csv file shap_values.csv is missing in the PR

@mattahrens
Copy link
Collaborator Author

The shap_values.csv is generated at runtime in the xgboost_predictions folder, so I didn't think it needed to be checked in

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindyyuanjiang Please take a look at this PR as you were adding the CLI to see if there is anything that affects the CLI feature.

@leewyang
Copy link
Collaborator

leewyang commented May 1, 2024

@mattahrens LGTM. Keep in mind that this prints the shap values across the entire dataset. If you want to see per-sample values, then you'd need to modify this line, but then you'd also end up with a dataframe of shape (num_samples, num_features) vs (num_features, 1) currently. However, I'm not sure that would be very usable vs. debugging via individual shap waterfall plots offline.

@mattahrens
Copy link
Collaborator Author

I wanted to start with the values across the entire dataset and get feedback. And then if we want to go per-sample, we can enhance it later to provide that.

@mattahrens mattahrens merged commit fcfda2e into NVIDIA:dev May 1, 2024
15 checks passed
@mattahrens mattahrens deleted the mahrens-adding-shap-predict-values branch May 1, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants