-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LsiModel: Only log top words that actually exist in the dictionary #3091
Conversation
@piskvorky should I add any documentation or test cases? |
Thanks! Maybe add a code comment that links to #3090. So that future people reading the code are not confused or don't accidentally delete that code, thinking it's useless. |
In some pathological cases, we might try to log the top N words, even though we haven't seen N words yet. In these cases, we can just exit the loop early. Closes piskvorky#3090.
Ok, force-pushed a comment. Thanks for being so responsive 😄 |
In 8f8cb49, I added a check that checks val in self.id2word When testing, `id2word` is actually an instance of `FakeDict`, which doesn't implement `__contains__()` (so Python falls back to calling `__getitem__()`[1]). The tests didn't like this[2]. [1] https://docs.python.org/3.6/reference/datamodel.html#object.__contains__ [2] https://github.com/RaRe-Technologies/gensim/runs/2197137529
@piskvorky anything else you need from me here? No rush, just trying to make sure it doesn't get lost in the shuffle 😄 |
Thanks, a reminder is definitely a good idea :) PR already approved by me; the next step is a review by @mpenkov, who will merge if all OK. |
Looks good to me! Thank you @kmurphy4 ! |
Thank you! |
In some pathological cases, we might try to log the top N words, even
though we haven't seen N words yet. In these cases, we can just exit
the loop early.
Closes #3090.