-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] sklearn API for Gensim models #1462
[WIP] sklearn API for Gensim models #1462
Conversation
Are you sure there is no memory duplication for doc2vec numpy arrays? can you please run it through a memory profiler. |
if self.gensim_model is None: | ||
raise NotFittedError("This model has not been fitted yet. Call 'fit' with appropriate arguments before using this method.") | ||
|
||
# The input as array of array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call them python lists
Let's aim to merge it this week. The missing things are ipynb and transform tests. |
self.assertEqual(matrix.shape[0], 1) | ||
self.assertEqual(matrix.shape[1], self.model.size) | ||
|
||
def testSetGetParams(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add checking with the original model too for each "getset" test (same as previous PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…into skl_api_gensim
Great @chinmayapancholi13💯 |
* created sklearn wrapper for Doc2Vec * PEP8 fix * added 'transform' function and refactored code * updated d2v skl api code * added unittests for sklearn api for d2v model * fixed flake8 errors * added skl api class for Text2Bow model * updated docstring for d2vmodel api * updated text2bow skl api code * added unittests for text2bow skl api class * updated 'testPipeline' and 'testTransform' for text2bow * added 'tokenizer' param to text2bow skl api * updated unittests for text2bow * removed get_params and set_params functions from existing classes * added tfidf api class * added unittests for tfidf api class * flake8 fixes * added skl api for hdpmodel * added unittests for hdp model api class * flake8 fixes * updated hdp api class * added 'testPartialFit' and 'testPipeline' tests for hdp api class * flake8 fixes * added skl API class for phrases * added unit tests for phrases API class * flake8 fixes * added 'testPartialFit' function for 'TestPhrasesTransformer' * updated 'testPipeline' function for 'TestText2BowTransformer' * updated code for transform function for HDP transformer * updated tests as discussed in PR 1473 * added examples for new models in ipynb * unpinned sklearn version for running unit-tests * updated 'Pipeline' initialization format * updated 'Pipeline' initialization format in ipynb
This PR creates scikit-learn API for the following Gensim models:
The implementation for the following models is still left: