-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HashDictionary documentation #2073
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Specifically the part about initializing HashDictionary object without passing a corpus and changing the definition of document from "list of strings" to "sequence of strings". Thanks!
gensim/corpora/hashdictionary.py
Outdated
* All tokens will be used (not only that you see in documents), typical problem | ||
for :class:`~gensim.corpora.dictionary.Dictionary`. | ||
|
||
* Able to represent all tokens (not only those present in training documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Able to represent any token..." would be better wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7a38bcd.
documents : iterable of iterable of str | ||
Iterable of documents, if given - use them to initialization. | ||
Iterable of documents. If given, used to collect additional corpus statistics. HashDictionary can work without these statistics (optional parameter). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what was required. Thanks
@menshikh-iv looks like some flake8 test failed -- line too long. I don't think we care for such errors, can you disable it? (and re-run the tests) |
The documentation for HashDictionary used broken English, broken formatting and presented some misleading information. This is confusing to users -- see for example #2049.
This PR attempts to fix the docs.