Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Sklearn wrapper for RandomProjections Model #1395

Merged

Conversation

chinmayapancholi13
Copy link
Contributor

This PR creates an sklearn wrapper for Random Projections model.

@chinmayapancholi13 chinmayapancholi13 changed the title Sklearn wrapper for RandomProjections Model [WIP] Sklearn wrapper for RandomProjections Model Jun 6, 2017
Base RP module
"""

def __init__(self, corpus, id2word=None, num_topics=300):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove corpus argument, you should pass corpus only to fit method

def fit(self, X, y=None):
"""
For fitting corpus into class object.
Calls gensim.models.RpModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace doc-string to Fit the model according to the given training data.

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor code style nitpicks :)

@@ -58,7 +58,7 @@ def set_params(self, **parameters):
def fit(self, X, y=None):
"""
For fitting corpus into the class object.
Calls gensim.model.LsiModel:
Calls gensim.models.LsiModel:
>>>gensim.models.LsiModel(corpus=corpus, num_topics=num_topics, id2word=id2word, chunksize=chunksize, decay=decay, onepass=onepass, power_iters=power_iters, extra_samples=extra_samples)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation line doesn't seem to help -- what are these undefined variables like id2word, chunksize etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These params (id2word, chunksize etc) are associated with the LSI model. This change is in the file sklearn_wrapper_gensim_lsimodel.py. Since this change was so small (literally one word in a docstring), I added this change in this PR (PR concerning RP model wrapper) itself.
There is also a similar change for LDA model here. Should I remove these changes from this PR?

#
# Copyright (C) 2011 Radim Rehurek <radimrehurek@seznam.cz>
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
#
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: remove #, insert blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated now.

Scikit learn interface for gensim for easy use of gensim with scikit-learn
Follows scikit-learn API conventions
"""
from gensim import models
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line before imports.

Also, block the imports: built-in first, 3rd party second, local package imports last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated now.

@menshikh-iv
Copy link
Contributor

Same comments as in #1405

score = text_rp.score(corpus, data.target)
self.assertGreater(score, 0.40)

def testPersistence(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as LdaSeq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

text_rp = Pipeline((('features', model,), ('classifier', clf)))
text_rp.fit(corpus, data.target)
score = text_rp.score(corpus, data.target)
self.assertGreater(score, 0.40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as LdaSeq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@menshikh-iv menshikh-iv merged commit 2ea1af0 into piskvorky:develop Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants