Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CatBoost converter #392

Merged
merged 10 commits into from
Jun 8, 2020
Merged
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ ONNXMLTools enables you to convert models from different machine learning toolki
* libsvm
* XGBoost
* H2O
* CatBoost
<p>Pytorch has its builtin ONNX exporter check <a href="https://pytorch.org/docs/stable/onnx.html">here</a> for details</p>

## Install
Expand All @@ -31,7 +32,7 @@ pip install git+https://github.com/onnx/onnxmltools
If you choose to install `onnxmltools` from its source code, you must set the environment variable `ONNX_ML=1` before installing the `onnx` package.

## Dependencies
This package relies on ONNX, NumPy, and ProtoBuf. If you are converting a model from scikit-learn, Core ML, Keras, LightGBM, SparkML, XGBoost, H2O or LibSVM, you will need an environment with the respective package installed from the list below:
This package relies on ONNX, NumPy, and ProtoBuf. If you are converting a model from scikit-learn, Core ML, Keras, LightGBM, SparkML, XGBoost, H2O, CatBoost or LibSVM, you will need an environment with the respective package installed from the list below:
1. scikit-learn
2. CoreMLTools
3. Keras (version 2.0.8 or higher) with the corresponding Tensorflow version
Expand All @@ -40,6 +41,7 @@ This package relies on ONNX, NumPy, and ProtoBuf. If you are converting a model
6. XGBoost (scikit-learn interface)
7. libsvm
8. H2O
9. CatBoost

ONNXMLTools has been tested with Python **3.5**, **3.6**, and **3.7**.
Version 1.6.1 is the latest version supporting Python 2.7.
Expand Down
1 change: 1 addition & 0 deletions onnxmltools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from .convert import convert_tensorflow
from .convert import convert_xgboost
from .convert import convert_h2o
from .convert import convert_catboost

from .utils import load_model
from .utils import save_model
1 change: 1 addition & 0 deletions onnxmltools/convert/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
from .main import convert_tensorflow
from .main import convert_xgboost
from .main import convert_h2o
from .main import convert_catboost
23 changes: 23 additions & 0 deletions onnxmltools/convert/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,29 @@ def convert_libsvm(model, name=None, initial_types=None, doc_string='', target_o
custom_conversion_functions, custom_shape_calculators)


def convert_catboost(model, name=None, initial_types=None, doc_string='', target_opset=None,
targeted_onnx=onnx.__version__, custom_conversion_functions=None, custom_shape_calculators=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other converters keeps arguments like "targeted_onnx=onnx.version, custom_conversion_functions=None, custom_shape_calculators=None" for the backward compatibility, if there is a brand new one, these arguments could be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, did so

try:
from catboost.utils import convert_to_onnx_object
except ImportError:
raise RuntimeError('CatBoost is not installed or need to be updated. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "needs to be updated."

'Please install/upgrade CatBoost to use this feature.')

if custom_conversion_functions:
warnings.warn('custom_conversion_functions is not supported. Please set it to None.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include these converter arguments if they are not supported? It might be better to remove the arguments entirely. In the code above for the keras converter, these arguments were deprecated, which is why the warning messages were necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought all convertors have pretty the same interface and thus added the args). I have discussed the matter with the member of CatBoost team. I will create a pr to change CatBoost converter interface to pass those args to the CatBoost's side. CatBoost team may implement the functionality in the future. I will update my pr when the change is released if it is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature in onnxmltools is not always the same. Only in sklearn-onnx. So I would either remove the parameter either raise an exception if the parameter is not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I removed arguments that are not supported

if custom_shape_calculators:
warnings.warn('custom_shape_calculators is not supported. Please set it to None.')

export_parameters = {
'onnx_domain': 'ai.catboost',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using existing domains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You are right, I will change it to the ai.onnx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

'onnx_model_version': 0,
'onnx_doc_string': doc_string,
'onnx_graph_name': name
}

return convert_to_onnx_object(model, export_parameters=export_parameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need take care of the target_opset argument, which specify what's the opset version will be used in the generated ONNX model.
If you plan to only support one target_opset currently, you need check target_opset and report an issue if the user target_opset is not as same as the one that is support in Catboost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I pass the target_opset to Catboost and check it there



def convert_lightgbm(model, name=None, initial_types=None, doc_string='', target_opset=None,
targeted_onnx=onnx.__version__, custom_conversion_functions=None, custom_shape_calculators=None):
if not utils.lightgbm_installed():
Expand Down
3 changes: 3 additions & 0 deletions onnxmltools/utils/tests_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,9 @@ def convert_model(model, name, input_types):
model, prefix = convert_lightgbm(model, name, input_types), "LightGbm"
else:
raise RuntimeError("Unable to convert model of type '{0}'.".format(type(model)))
elif model.__class__.__name__.startswith("Cat"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better fingerprint to identify the original model?

from onnxmltools.convert import convert_catboost
model, prefix = convert_catboost(model, name, input_types), "Cat"
elif isinstance(model, BaseEstimator):
from onnxmltools.convert import convert_sklearn
model, prefix = convert_sklearn(model, name, input_types), "Sklearn"
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ scipy
svm
wheel
xgboost<=1.0.2
catboost
53 changes: 53 additions & 0 deletions tests/catboost/test_CatBoost_converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Tests for CatBoostRegressor and CatBoostClassifier converter.
"""
import unittest
import numpy
import catboost
from sklearn.datasets import make_regression, make_classification
from onnxmltools.convert import convert_catboost
from onnxmltools.utils import dump_data_and_model, dump_single_regression, dump_multiple_classification


class TestCatBoost(unittest.TestCase):
def test_catboost_regressor(self):
X, y = make_regression(n_samples=100, n_features=4, random_state=0)
catboost_model = catboost.CatBoostRegressor(task_type='CPU', loss_function='RMSE',
n_estimators=10, verbose=0)
dump_single_regression(catboost_model)

catboost_model.fit(X.astype(numpy.float32), y)
catboost_onnx = convert_catboost(catboost_model, name='CatBoostRegression',
doc_string='test regression')
self.assertTrue(catboost_onnx is not None)
dump_data_and_model(X.astype(numpy.float32), catboost_model, catboost_onnx, basename="CatBoostReg-Dec4")

def test_catboost_bin_classifier(self):
X, y = make_classification(n_samples=100, n_features=4, random_state=0)
catboost_model = catboost.CatBoostClassifier(task_type='CPU', loss_function='CrossEntropy',
n_estimators=10, verbose=0)

catboost_model.fit(X.astype(numpy.float32), y)

catboost_onnx = convert_catboost(catboost_model, name='CatBoostBinClassification',
doc_string='test binary classification')
self.assertTrue(catboost_onnx is not None)
# onnx runtime returns zeros as class labels
# dump_data_and_model(X.astype(numpy.float32), catboost_model, catboost_onnx, basename="CatBoostBinClass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this part has a problem :(
The comparison works properly with probabilities, not with labels. A converted model returns only zeros as labels. I consulted the Catboost team and they consider it as onnxruntime bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be fixed and it is probably an error somewhere in the onnx graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information, I reported your reply to the Catboost team members and I will update my pr after they fix the bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now with the new onnxruntime version


def test_catboost_multi_classifier(self):
X, y = make_classification(n_samples=10, n_informative=8, n_classes=3, random_state=0)
catboost_model = catboost.CatBoostClassifier(task_type='CPU', loss_function='MultiClass', n_estimators=100,
verbose=0)

dump_multiple_classification(catboost_model)

catboost_model.fit(X.astype(numpy.float32), y)
catboost_onnx = convert_catboost(catboost_model, name='CatBoostMultiClassification',
doc_string='test multiclass classification')
self.assertTrue(catboost_onnx is not None)
dump_data_and_model(X.astype(numpy.float32), catboost_model, catboost_onnx, basename="CatBoostMultiClass")


if __name__ == "__main__":
unittest.main()