-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phrases __getitem__() method does not respect chosen scoring function #1533
Comments
You're right, looks like a bug, thanks for reporting. Since you already have the solution, can you open a PR with the fix? CC @michaelwsherman re. #1464. |
Sorry, I don't have the fix yet. I just use Phraser. |
Good find--i totally forgot about getitem when I implemented this. I can get a fix together, should be fairly straightforward. Will be a few weeks though, and I assume this isn't a showstopper since Phraser still works. @sergeididenko -- are you using npmi? That makes me happy. |
@michaelwsherman Yes, I prefer MPI scores over alternatives. Can't say yet if it's better for my current project though. |
Fix in PR #1573 . |
…iskvorky#1573) * initial commit of fixes in comments of piskvorky#1423 * removed unnecessary space in logger * added support for custom Phrases scorers * fixed Phrases.__getitem__ to support pluggable scoring piskvorky#1533 * travisCI style fixes * fixed __next__() to next() for python 3 compatibilyt * misc fixes * spacing fixes for style * custom scorer support in sklearn api * Phrases scikit interface tests for pluggable scoring * missing line breaks * style, clarity, and robustness fixes requested by @piskvorky * check in Phrases init to make sure scorer is pickleable * backwards scoring compatibility when loading a Phrases class * removal of pickle testing objects in Phrases init * switched to six for python 2/3 compatibility * fix docstring
Description
Phrases
__getitem__()
method does not respect chosen scoring function, it has 'default' scoring builtin. So when Phrases object is constructed withscoring='npmi'
it returns wrong results.See Phrases
export_phrases()
for correct implementation, i.e.score = scoring_function(count_a, count_b, count_ab)
instead ofscore = (pab - min_count) / pa / pb * len(vocab)
For a while it's better to construct and use Phraser object, i.e.
phraser = Phraser[phrases]; res = phraser[sentences]
instead ofres = phrases[sentences]
The text was updated successfully, but these errors were encountered: