-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that preprocessing.strip_punctuation
is limited to ASCII punctuation characters
#2964
Conversation
Add ASCII as qualification on `strip_punctuation` doc string. This is "option 1" fix for issue piskvorky#2962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Can you also add a code comment (not docstring – a Python comment near the beginning of this function) that links to your ticket with the "three options"? So it's easier to find, and future contributors know it exists. Thanks!
Code comment added linking to issue piskvorky#2962 as a reminder of enhancement possibilities.
Yes. I've done so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
Congrats on your first PR to gensim @sciatro 🥇 ! |
preprocessing.strip_punctuation
is limited to ASCII punctuation characters
Reminder to self to update the CHANGELOG entry before a release. The current entry makes it sound like |
This PR mitigates issue #2962 by documenting that the function's current behavior is limited to ASCII punctuation.
Issue #2962 includes discussion of approaches to expanding strip_punctuation's behavior to cover unicode punctuation. This PR includes no new functionality / does not implement any of the possibilities considered in the issue. This PR is option 1 of the 3 considered in #2962