Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify tfidf_transformer to enable custom vocabulary and approximate sublinear-tf scaling without sparse containers #777

Merged
merged 16 commits into from
Nov 15, 2021

Conversation

adam444555
Copy link
Contributor

@adam444555 adam444555 commented Nov 12, 2021

The stop_words_ attribute does not exist if custom vocabulary is provided, resulting in AttributeError. Fix it by hasattr check.
Approximate the sublinear_tf scaling by: first add all coefficient (included null coefficient) by 1 and then take log, i.e. replace tf with log(1+tf). Null coefficient remains to be 0 after the operation.

xadupre and others added 12 commits November 12, 2021 13:22
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
* Make opset 15 default
* fix missing target opset in polynomial features

Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
Signed-off-by: adam444555 <a473489548@gmail.com>
…onnx#773)

* Update a training value in a failing pipeline used in a unit test
* upgrade version

Signed-off-by: adam444555 <a473489548@gmail.com>
onnx#772)

* Enable RandomForestClassifier in converter for CalibrationClassifierCV
* remove unused variable

Signed-off-by: adam444555 <a473489548@gmail.com>
* Implements option zipmap for MultiOutputClassifier

Signed-off-by: adam444555 <a473489548@gmail.com>
fix code length

Signed-off-by: adam444555 <a473489548@gmail.com>
@xadupre
Copy link
Collaborator

xadupre commented Nov 12, 2021

#777 (comment): sorry I was not clear enough, I just meant adding something like assert not hasattr(vect, 'stop_words_').

adam444555 and others added 2 commits November 15, 2021 14:26
Signed-off-by: adam444555 <a473489548@gmail.com>
@adam444555
Copy link
Contributor Author

A specific commit could not be signed off (I tried with many approaches but no one worked, and I had no idea why). Therefore I rebase the branch to drop the problemetic commit, and then recommit the same change.

@xadupre
Copy link
Collaborator

xadupre commented Nov 15, 2021

Sorry for the trouble. Let me know if it is ready to be merged.

@adam444555
Copy link
Contributor Author

Sorry for the trouble. Let me know if it is ready to be merged.

Yes, it is ready right now!

@xadupre xadupre merged commit 09be7da into onnx:master Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants