Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CatBoost converter #392

Merged
merged 10 commits into from
Jun 8, 2020
Merged

CatBoost converter #392

merged 10 commits into from
Jun 8, 2020

Conversation

monkey0head
Copy link
Contributor

Hi! I want to add CatBoost models conversion functionality.

CatBoost has its own converter to ONNX-ML and I added an interface to convert CatBoost models with onnxmltools as it would be more convenient for users to be able to convert CatBoost models in the same way as models created with other popular ml toolkits.

@vinitra-zz vinitra-zz requested review from wenbingl and xadupre and removed request for wenbingl May 15, 2020 20:36
@vinitra-zz
Copy link
Member

Thanks for the contribution, @monkey0head!

doc_string='test binary classification')
self.assertTrue(catboost_onnx is not None)
# onnx runtime returns zeros as class labels
# dump_data_and_model(X.astype(numpy.float32), catboost_model, catboost_onnx, basename="CatBoostBinClass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this part has a problem :(
The comparison works properly with probabilities, not with labels. A converted model returns only zeros as labels. I consulted the Catboost team and they consider it as onnxruntime bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be fixed and it is probably an error somewhere in the onnx graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information, I reported your reply to the Catboost team members and I will update my pr after they fix the bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now with the new onnxruntime version

'Please install/upgrade CatBoost to use this feature.')

if custom_conversion_functions:
warnings.warn('custom_conversion_functions is not supported. Please set it to None.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include these converter arguments if they are not supported? It might be better to remove the arguments entirely. In the code above for the keras converter, these arguments were deprecated, which is why the warning messages were necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought all convertors have pretty the same interface and thus added the args). I have discussed the matter with the member of CatBoost team. I will create a pr to change CatBoost converter interface to pass those args to the CatBoost's side. CatBoost team may implement the functionality in the future. I will update my pr when the change is released if it is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature in onnxmltools is not always the same. Only in sklearn-onnx. So I would either remove the parameter either raise an exception if the parameter is not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I removed arguments that are not supported

try:
from catboost.utils import convert_to_onnx_object
except ImportError:
raise RuntimeError('CatBoost is not installed or need to be updated. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "needs to be updated."

warnings.warn('custom_shape_calculators is not supported. Please set it to None.')

export_parameters = {
'onnx_domain': 'ai.catboost',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using existing domains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You are right, I will change it to the ai.onnx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -44,6 +44,29 @@ def convert_libsvm(model, name=None, initial_types=None, doc_string='', target_o
custom_conversion_functions, custom_shape_calculators)


def convert_catboost(model, name=None, initial_types=None, doc_string='', target_opset=None,
targeted_onnx=onnx.__version__, custom_conversion_functions=None, custom_shape_calculators=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other converters keeps arguments like "targeted_onnx=onnx.version, custom_conversion_functions=None, custom_shape_calculators=None" for the backward compatibility, if there is a brand new one, these arguments could be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, did so

@@ -212,6 +212,9 @@ def convert_model(model, name, input_types):
model, prefix = convert_lightgbm(model, name, input_types), "LightGbm"
else:
raise RuntimeError("Unable to convert model of type '{0}'.".format(type(model)))
elif model.__class__.__name__.startswith("Cat"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better fingerprint to identify the original model?

'onnx_graph_name': name
}

return convert_to_onnx_object(model, export_parameters=export_parameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need take care of the target_opset argument, which specify what's the opset version will be used in the generated ONNX model.
If you plan to only support one target_opset currently, you need check target_opset and report an issue if the user target_opset is not as same as the one that is support in Catboost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I pass the target_opset to Catboost and check it there

@monkey0head
Copy link
Contributor Author

There was a problem with binary classification: #392 (comment)

The converter for binary classification works well now with the new version of onnxruntime (tested locally and tests for rt130 are passed here) but fails for the older versions.

@monkey0head
Copy link
Contributor Author

Hi! Could you please have a look at the updated pull request?

Regarding the binary classification issue: The converter for binary classification works well with the new version of onnxruntime. The checks fail on the older versions.

@wenbingl
Copy link
Member

wenbingl commented Jun 2, 2020

I saw some failure in CI pipeline which blocks your PR merging. Can you fix them before merge the PR?

@monkey0head
Copy link
Contributor Author

I saw some failure in CI pipeline which blocks your PR merging. Can you fix them before merge the PR?

@wenbingl, sure. Fixed!

@wenbingl wenbingl merged commit 6b46e34 into onnx:master Jun 8, 2020
@nikitxskv
Copy link

nikitxskv commented Jun 10, 2020

@vinitra @wenbingl @xadupre
Hi!
Thank you for code review!
Can you tell us (CatBoost team) what should be done to be added to the list of supported tools here?

@xadupre
Copy link
Collaborator

xadupre commented Jun 10, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants