Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docstrings for gensim.models.rpmodel #1802

Merged
merged 14 commits into from
Dec 27, 2017
72 changes: 54 additions & 18 deletions gensim/models/rpmodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,41 +5,54 @@
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html


import logging
"""
Objects of this class allow building and maintaining a model for Random Projections
(also known as Random Indexing).

import numpy as np
For theoretical background on RP, see: Kanerva et al.: "Random indexing of text samples for Latent Semantic Analysis."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add concrete link to paper, something like this - https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/parsing/porter.py#L6


from gensim import interfaces, matutils, utils
The main methods are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add example here (sometimes code tell us more than text)


1. constructor, which creates the random projection matrix
2. the [] method, which transforms a simple count representation into the TfIdf
space.

logger = logging.getLogger('gensim.models.rpmodel')
Model persistency is achieved via its load/save methods.


class RpModel(interfaces.TransformationABC):
"""
Objects of this class allow building and maintaining a model for Random Projections
(also known as Random Indexing). For theoretical background on RP, see:
Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant :

---------
>>> from gensim.models import rpmmodel
>>> rp = RpModel(corpus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example isn't executable (i.e. corpus isn't defined), "executable" means that I can copy-paste it to console and this runs successfully.

>>> print(rp[some_doc])
>>> rp.save('/tmp/foo.rp_model')
"""

Kanerva et al.: "Random indexing of text samples for Latent Semantic Analysis."
import logging

The main methods are:
import numpy as np

1. constructor, which creates the random projection matrix
2. the [] method, which transforms a simple count representation into the TfIdf
space.
from gensim import interfaces, matutils, utils

>>> rp = RpModel(corpus)
>>> print(rp[some_doc])
>>> rp.save('/tmp/foo.rp_model')

Model persistency is achieved via its load/save methods.
"""
logger = logging.getLogger('gensim.models.rpmodel')


class RpModel(interfaces.TransformationABC):

def __init__(self, corpus, id2word=None, num_topics=300):
"""
`id2word` is a mapping from word ids (integers) to words (strings). It is
used to determine the vocabulary size, as well as for debugging and topic
printing. If not set, it will be determined from the corpus.


Parameters
----------
corpus : interfaces.CorpusABC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use

:class:`~gensim.interfaces.CorpusABC`

when you reference a class from anywhere (gensim or something else)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add :class:

id2word : dict of int tostring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict of (int, string)

num_topics : int

"""
self.id2word = id2word
self.num_topics = num_topics
Expand All @@ -52,6 +65,12 @@ def __str__(self):
def initialize(self, corpus):
"""
Initialize the random projection matrix.


Parameters
----------
corpus : :class:`~interfaces.CorpusABC`

"""
if self.id2word is None:
logger.info("no word id mapping provided; initializing from corpus, assuming identity")
Expand All @@ -75,6 +94,16 @@ def initialize(self, corpus):
def __getitem__(self, bow):
"""
Return RP representation of the input vector and/or corpus.

Parameters
----------
bow : :class:`~interfaces.CorpusABC` (iterable of documents) or list of (int, int).

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant :

-------------
>>> rp = RpModel(corpus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corpus is undefined

>>> print(rp[some_doc])

"""
# if the input vector is in fact a corpus, return a transformed corpus as result
is_corpus, bow = utils.is_corpus(bow)
Expand All @@ -96,5 +125,12 @@ def __getitem__(self, bow):
]

def __setstate__(self, state):
"""
Sets the internal state and updates freshly_loaded to True. Called when unpicked.

Parameters
----------
state : state of the class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state : dict
    State of the class

"""
self.__dict__ = state
self.freshly_loaded = True