Skip to content

Commit

Permalink
remove catboost support
Browse files Browse the repository at this point in the history
  • Loading branch information
krzyzinskim committed Sep 28, 2023
1 parent cd54efd commit fff85eb
Show file tree
Hide file tree
Showing 31 changed files with 60 additions and 567 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Authors@R: c(
Description: An efficient implementation of the TreeSHAP algorithm
introduced by Lundberg et al., (2020) <doi:10.1038/s42256-019-0138-9>.
It is capable of calculating SHAP values for tree-based models in
polynomial time. Currently supported models include 'catboost',
polynomial time. Currently supported models include
'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'.
License: GPL-3
URL: https://modeloriented.github.io/treeshap/,
Expand Down
2 changes: 0 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,12 @@
S3method(predict,model_unified)
S3method(print,model_unified)
S3method(print,treeshap)
S3method(unify,catboost.Model)
S3method(unify,default)
S3method(unify,gbm)
S3method(unify,lgb.Booster)
S3method(unify,randomForest)
S3method(unify,ranger)
S3method(unify,xgb.Booster)
export(catboost.unify)
export(colors_breakdown_drwhy)
export(colors_discrete_drwhy)
export(gbm.unify)
Expand Down
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
treeshap 0.2.1
treeshap 0.2.2
----------------------------------------------------------------
* Initial CRAN submission.
* Removed `catboost.unify` function (as the `catboost` package is not on CRAN)

treeshap 0.2.1
----------------------------------------------------------------
* Fixed `randomForest.unify` for classifiers ([#12](https://github.com/ModelOriented/treeshap/issues/12), [#23](https://github.com/ModelOriented/treeshap/issues/23))
* Implemented consolidated (generic) `unify` function ([#18](https://github.com/ModelOriented/treeshap/issues/18))
* An error is thrown when the data passed to the `unify` or `treeshap` functions contain variables that are not used by the model ([#14](https://github.com/ModelOriented/treeshap/issues/14))
Expand Down
2 changes: 0 additions & 2 deletions R/model_unified.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{Catboost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down
2 changes: 0 additions & 2 deletions R/set_reference_dataset.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{Catboost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down
1 change: 0 additions & 1 deletion R/treeshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
#' \code{\link{xgboost.unify}} for \code{XGBoost models}
#' \code{\link{lightgbm.unify}} for \code{LightGBM models}
#' \code{\link{gbm.unify}} for \code{GBM models}
#' \code{\link{catboost.unify}} for \code{catboost models}
#' \code{\link{randomForest.unify}} for \code{randomForest models}
#' \code{\link{ranger.unify}} for \code{ranger models}
#' \code{\link{ranger_surv.unify}} for \code{ranger survival models}
Expand Down
9 changes: 1 addition & 8 deletions R/unify.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#' Convert your tree-based model into a standardized representation.
#' The returned representation is easy to be interpreted by the user and ready to be used as an argument in \code{treeshap()} function.
#'
#' @param model A tree-based model object of any supported class (\code{gbm}, \code{lgb.Booster}, \code{randomForest}, \code{ranger}, \code{xgb.Booster}, or \code{catboost.Model}).
#' @param model A tree-based model object of any supported class (\code{gbm}, \code{lgb.Booster}, \code{randomForest}, \code{ranger}, or \code{xgb.Booster}).
#' @param data Reference dataset. A \code{data.frame} or \code{matrix} with the same columns as in the training set of the model. Usually dataset used to train model.
#' @param ... Additional parameters passed to the model-specific unification functions.
#'
Expand All @@ -14,8 +14,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down Expand Up @@ -73,11 +71,6 @@ unify.xgb.Booster <- function(model, data, recalculate = FALSE, ...){
xgboost.unify(model, data, recalculate)
}

#' @export
unify.catboost.Model <- function(model, data, recalculate = FALSE, ...){
catboost.unify(model, data, recalculate)
}

#' @export
unify.default <- function(model, data, ...){
stop("Provided model is not of type supported by treeshap.")
Expand Down
173 changes: 0 additions & 173 deletions R/unify_catboost.R

This file was deleted.

2 changes: 0 additions & 2 deletions R/unify_gbm.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@
#' @seealso
#' \code{\link{lightgbm.unify}} for \code{\link[lightgbm:lightgbm]{LightGBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down
2 changes: 0 additions & 2 deletions R/unify_lightgbm.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down
2 changes: 0 additions & 2 deletions R/unify_randomForest.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
Expand Down
2 changes: 0 additions & 2 deletions R/unify_ranger.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{randomForest.unify}} for \code{\link[randomForest:randomForest]{randomForest models}}
Expand Down
2 changes: 0 additions & 2 deletions R/unify_ranger_surv.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{xgboost.unify}} for \code{\link[xgboost:xgboost]{XGBoost models}}
#'
#' \code{\link{randomForest.unify}} for \code{\link[randomForest:randomForest]{randomForest models}}
Expand Down
2 changes: 0 additions & 2 deletions R/unify_xgboost.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@
#'
#' \code{\link{gbm.unify}} for \code{\link[gbm:gbm]{GBM models}}
#'
#' \code{\link{catboost.unify}} for \code{\link[catboost:catboost.train]{CatBoost models}}
#'
#' \code{\link{ranger.unify}} for \code{\link[ranger:ranger]{ranger models}}
#'
#' \code{\link{randomForest.unify}} for \code{\link[randomForest:randomForest]{randomForest models}}
Expand Down
31 changes: 20 additions & 11 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ set.seed(21)

<!-- badges: end -->

In the era of complicated classifiers conquering their market, sometimes even the authors of algorithms do not know the exact manner of building a tree ensemble model. The difficulties in models' structures are one of the reasons why most users use them simply like black-boxes. But, how can they know whether the prediction made by the model is reasonable? `treeshap` is an efficient answer for this question. Due to implementing an optimized algorithm for tree ensemble models (called TreeSHAP), it calculates the SHAP values in polynomial (instead of exponential) time. Currently, `treeshap` supports models produced with `xgboost`, `lightgbm`, `gbm`, `catboost`, `ranger`, and `randomForest` packages.
In the era of complicated classifiers conquering their market, sometimes even the authors of algorithms do not know the exact manner of building a tree ensemble model. The difficulties in models' structures are one of the reasons why most users use them simply like black-boxes. But, how can they know whether the prediction made by the model is reasonable? `treeshap` is an efficient answer for this question. Due to implementing an optimized algorithm for tree ensemble models (called TreeSHAP), it calculates the SHAP values in polynomial (instead of exponential) time. Currently, `treeshap` supports models produced with `xgboost`, `lightgbm`, `gbm`, `ranger`, and `randomForest` packages. Support for `catboost` is available only in [`catboost` branch](https://github.com/ModelOriented/treeshap/tree/catboost) (see why [here](#catboost)).

## Installation

Expand Down Expand Up @@ -134,16 +134,15 @@ Dataset used as a reference for calculating SHAP values is stored in unified mod

```{r set_reference_dataset, eval=FALSE}
library(treeshap)
library(catboost)
data <- fifa20$data[colnames(fifa20$data) != 'work_rate']
label <- fifa20$target
dt.pool <- catboost::catboost.load_pool(data = as.data.frame(lapply(data, as.numeric)), label = label)
cat_model <- catboost::catboost.train(
dt.pool,
params = list(loss_function = 'RMSE', iterations = 100,
logging_level = 'Silent', allow_writing_files = FALSE))
unified_catboost <- unify(cat_model, dt.pool, data)
unified_catboost2 <- set_reference_dataset(unified_catboost, data[c(1000:2000), ])
library(ranger)
data_fifa <- fifa20$data[!colnames(fifa20$data) %in%
c('work_rate', 'value_eur', 'gk_diving', 'gk_handling',
'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')]
data <- na.omit(cbind(data_fifa, target = fifa20$target))
rf <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10)
unified_ranger_model <- unify(rf, data)
unified_ranger_model2 <- set_reference_dataset(unified_ranger_model, data[c(1000:2000), ])
```

## Other functionalities
Expand All @@ -158,6 +157,16 @@ Our implementation works at a speed comparable to the original Lundberg's Python

The complexity of SHAP interaction values computation is $\mathcal{O}(MTLD^2)$, where $M$ is the number of explanatory variables used by the explained model, $T$ is the number of trees, $L$ is the number of leaves in a tree, and $D$ is the depth of a tree.

## CatBoost
Originally, `treeshap` also supported the CatBoost models from the `catboost` package but due to the lack of this package on CRAN or R-universe (see `catboost` issues issues [#439](https://github.com/catboost/catboost/issues/439), [#1846](https://github.com/catboost/catboost/issues/1846)), we decided to remove support from the main version of our package.

However, you can still use the `treeshap` implementation for `catboost` by installing our package from [`catboost` branch](https://github.com/ModelOriented/treeshap/tree/catboost).

This branch can be installed with:

``` r
devtools::install_github('ModelOriented/treeshap@catboost')
```

## References
- Lundberg, S.M., Erion, G., Chen, H. et al. "From local explanations to global understanding with explainable AI for trees", Nature Machine Intelligence 2, 56–67 (2020).
Loading

0 comments on commit fff85eb

Please sign in to comment.