-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor documentation for gensim.similarities.docsim
.
#1910
Conversation
gensim/corpora/textcorpus.py
Outdated
>>> | ||
>>> corpus = CorpusMiislita(datapath('head500.noblanks.cor.bz2')) | ||
>>> corpus.get_texts() | ||
<generator object get_texts at 0x7fa932f397d0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad output, can you show the concrete line of the dataset next(iter(corpus.get_texts()))
?
gensim/corpora/textcorpus.py
Outdated
>>> if word not in CorpusMiislita.stoplist] | ||
>>> | ||
>>> def __len__(self): | ||
>>> if 'length' not in self.__dict__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to write something with logger, this should be simple & small example
gensim/corpora/textcorpus.py
Outdated
>>> | ||
>>> def get_texts(self): | ||
>>> for doc in self.getstream(): | ||
>>> yield [word for word in utils.to_unicode(doc).lower().split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some issues with formatting
gensim/corpora/textcorpus.py
Outdated
>>> corpus = CorpusMiislita(datapath('head500.noblanks.cor.bz2')) | ||
>>> corpus.get_texts() | ||
<generator object get_texts at 0x7fa932f397d0> | ||
>>> corpus.__len__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please len(dorpus)
instead of this one, call "magic" directly is bad pattern (and is justified only for specific cases)
gensim/similarities/docsim.py
Outdated
|
||
Return | ||
------ | ||
{:class: `~scipy.sparse.csr_matrix`, :class: `~numpy.array`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.array
-> numpy.ndarray
here and everywhere. Also, in this case, link shouldn't be rendered -> don't use ~
for numpy/scipy
gensim/similarities/docsim.py
Outdated
Size of shards should be chosen so that a `shardsize x chunksize` matrix of floats fits comfortably into | ||
main memory. | ||
norm : str, optional | ||
Normalization to use. Accepted values: {l1, l2}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, instead of
norm : str, optional
Normalization to use. Accepted values: {l1, l2}.
should be
norm : {'l1', 'l2'}, optional
Normalization to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better notation when we have several string pre-defined values.
@menshikh-iv