Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that preprocessing.strip_punctuation is limited to ASCII punctuation characters #2964

Merged
merged 4 commits into from
Jun 29, 2021

Conversation

sciatro
Copy link
Contributor

@sciatro sciatro commented Sep 29, 2020

This PR mitigates issue #2962 by documenting that the function's current behavior is limited to ASCII punctuation.

Issue #2962 includes discussion of approaches to expanding strip_punctuation's behavior to cover unicode punctuation. This PR includes no new functionality / does not implement any of the possibilities considered in the issue. This PR is option 1 of the 3 considered in #2962

Add ASCII as qualification on `strip_punctuation` doc string. 
This is "option 1" fix for issue piskvorky#2962
Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Can you also add a code comment (not docstring – a Python comment near the beginning of this function) that links to your ticket with the "three options"? So it's easier to find, and future contributors know it exists. Thanks!

Code comment added linking to issue piskvorky#2962 as a reminder of enhancement possibilities.
@sciatro
Copy link
Contributor Author

sciatro commented Sep 30, 2020

Can you also add a code comment

Yes. I've done so.

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

@mpenkov mpenkov added the documentation Current issue related to documentation label Jun 29, 2021
@mpenkov mpenkov changed the title Add that preprocessing.strip_punctuation is limited to ASCII to doc Document that preprocessing.strip_punctuation is limited to ASCII Jun 29, 2021
@mpenkov mpenkov merged commit dab0369 into piskvorky:develop Jun 29, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Jun 29, 2021

Congrats on your first PR to gensim @sciatro 🥇 !

@piskvorky piskvorky changed the title Document that preprocessing.strip_punctuation is limited to ASCII Document that preprocessing.strip_punctuation is limited to ASCII punctuation characters Jun 29, 2021
@piskvorky
Copy link
Owner

piskvorky commented Jun 29, 2021

Reminder to self to update the CHANGELOG entry before a release. The current entry makes it sound like strip_punctuation accepts only ASCII strings, which is not true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Current issue related to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants