Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tfidfvectorizer with sublinear_tf fails, despite opset version set to greater than 11 #996

Closed
RLeek opened this issue May 29, 2023 · 1 comment

Comments

@RLeek
Copy link

RLeek commented May 29, 2023

Hi, when attempting to convert a tfidfvectorizer as part of an sklearn pipeline, the following error is returned:

Traceback (most recent call last):
  File "/app/test.py", line 47, in <module>
    onnxModelPipeline = convert_sklearn(modelPipeline, "tfidf", initial_types=[("input", StringTensorType([None, 1]))], target_opset=12)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/convert.py", line 190, in convert_sklearn
    remove_identity=model_optim and not intermediate, verbose=verbose)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/common/_topology.py", line 1420, in convert_topology
    topology.convert_operators(container=container, verbose=verbose)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/common/_topology.py", line 1255, in convert_operators
    self.call_converter(operator, container, verbose=verbose)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/common/_topology.py", line 1061, in call_converter
    conv(self.scopes[0], operator, container)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/common/_registration.py", line 26, in __call__
    return self._fct(*args)
  File "/usr/local/lib/python3.7/site-packages/skl2onnx/operator_converters/tfidf_transformer.py", line 48, in convert_sklearn_tfidf_transformer
    "ONNX does not support sparse tensors before opset < 11, "
RuntimeError: ONNX does not support sparse tensors before opset < 11, sublinear_tf must be False.

The tfidfVectorizer is a pickled object created from sklearn 0.20.2, which does have sublinear_tf set to true. However, I've explictly set the opset to 12 in the code that's causing the above error, as follows:

with open("app/tidf_supervised_lemmatized_model/model.pkl.gzip", "rb") as f:
    model = pickle.load(gzip.decompress(f.read()))

with open("app/tidf_supervised_lemmatized_model/tfidf_vector.pkl.gzip", "rb") as f:
    tfidf = pickle.load(gzip.decompress(f.read()))


modelPipeline = Pipeline([('tfidfVectorizer', tfidf), ('model', model)])
onnxModelPipeline = convert_sklearn(modelPipeline, "tfidf", initial_types=[("input", StringTensorType([None, 2]))], target_opset=12)

Looking at the relevant source code, it seems the expectation is that opsets are set less than 11 rather than above 11 for sublinear_tf to work. If this is supposed to be the case, can the RuntimeError phrasing please be changed from "Onnx does not support sparse tensors before opset <11, sublinear_tf must be False" to "Onnx does not support sparse tensors after opset 11, sublinear_tf must be False".

        if operator.target_opset < 11:
            plus1 = scope.get_unique_variable_name("plus1")
            C = operator.inputs[0].type.shape[1]
            ones = scope.get_unique_variable_name("ones")
            cst = np.ones((C,), dtype=float_type)
            container.add_initializer(ones, proto_dtype, [C], cst.flatten())
            apply_add(scope, data + [ones], plus1, container, broadcast=1)
            plus1logged = scope.get_unique_variable_name("plus1logged")
            apply_log(scope, plus1, plus1logged, container)
            data = [plus1logged]
        else:
            # sparse containers have not yet been implemented.
            raise RuntimeError(
                "ONNX does not support sparse tensors before opset < 11, "
                "sublinear_tf must be False.")

Thanks

@xadupre
Copy link
Collaborator

xadupre commented Jan 23, 2024

This will be fixed in #1058.

@xadupre xadupre closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants