[WIP] sklearn API for Gensim models #1462

chinmayapancholi13 · 2017-07-05T09:49:15Z

This PR creates scikit-learn API for the following Gensim models:

The implementation for the following models is still left:

Normalization Model
LogEntropy Model
Dynamic Topic Model
Topic Coherence Model

tmylk · 2017-07-12T11:53:01Z

Are you sure there is no memory duplication for doc2vec numpy arrays? can you please run it through a memory profiler.

tmylk · 2017-07-12T12:03:41Z

gensim/sklearn_integration/sklearn_wrapper_gensim_text2bow.py

+        if self.gensim_model is None:
+            raise NotFittedError("This model has not been fitted yet. Call 'fit' with appropriate arguments before using this method.")
+
+        # The input as array of array


Please call them python lists

tmylk · 2017-07-27T13:21:46Z

Let's aim to merge it this week. The missing things are ipynb and transform tests.
Please add ipynb examples in a separate notebook for now. We will copy-paste the notebooks together when these models are ready to merge.

menshikh-iv · 2017-08-10T13:14:08Z

gensim/test/test_sklearn_integration.py

+        self.assertEqual(matrix.shape[0], 1)
+        self.assertEqual(matrix.shape[1], self.model.size)
+
+    def testSetGetParams(self):


please add checking with the original model too for each "getset" test (same as previous PR)

…into skl_api_gensim

menshikh-iv · 2017-08-18T13:08:53Z

Great @chinmayapancholi13💯
You gave a new interface for gensim:+1:

* created sklearn wrapper for Doc2Vec * PEP8 fix * added 'transform' function and refactored code * updated d2v skl api code * added unittests for sklearn api for d2v model * fixed flake8 errors * added skl api class for Text2Bow model * updated docstring for d2vmodel api * updated text2bow skl api code * added unittests for text2bow skl api class * updated 'testPipeline' and 'testTransform' for text2bow * added 'tokenizer' param to text2bow skl api * updated unittests for text2bow * removed get_params and set_params functions from existing classes * added tfidf api class * added unittests for tfidf api class * flake8 fixes * added skl api for hdpmodel * added unittests for hdp model api class * flake8 fixes * updated hdp api class * added 'testPartialFit' and 'testPipeline' tests for hdp api class * flake8 fixes * added skl API class for phrases * added unit tests for phrases API class * flake8 fixes * added 'testPartialFit' function for 'TestPhrasesTransformer' * updated 'testPipeline' function for 'TestText2BowTransformer' * updated code for transform function for HDP transformer * updated tests as discussed in PR 1473 * added examples for new models in ipynb * unpinned sklearn version for running unit-tests * updated 'Pipeline' initialization format * updated 'Pipeline' initialization format in ipynb

Chinmaya Pancholi and others added 8 commits July 5, 2017 02:40

created sklearn wrapper for Doc2Vec

f70c583

PEP8 fix

0c675fa

added 'transform' function and refactored code

7210c69

updated d2v skl api code

b733e25

added unittests for sklearn api for d2v model

7f198a1

fixed flake8 errors

8a12ef5

added skl api class for Text2Bow model

2c18b87

updated docstring for d2vmodel api

710d2ce

tmylk reviewed Jul 12, 2017

View reviewed changes

oxymor0n mentioned this pull request Jul 12, 2017

Add an sklearn wrapper for the Doc2Vec model #1481

Closed

chinmayapancholi13 added 15 commits July 13, 2017 01:59

updated text2bow skl api code

fe76d28

added unittests for text2bow skl api class

9acaba5

updated 'testPipeline' and 'testTransform' for text2bow

8c5d04e

added 'tokenizer' param to text2bow skl api

4101e30

updated unittests for text2bow

ed7a571

removed get_params and set_params functions from existing classes

66a8302

added tfidf api class

75faa7a

added unittests for tfidf api class

8cbd2ba

flake8 fixes

48958c0

added skl api for hdpmodel

a852980

added unittests for hdp model api class

5a16e77

flake8 fixes

9191053

updated hdp api class

57257be

added 'testPartialFit' and 'testPipeline' tests for hdp api class

de2e11d

flake8 fixes

acdb6dd

chinmayapancholi13 added 3 commits August 1, 2017 16:51

added skl API class for phrases

1c7da8e

added unit tests for phrases API class

47a4214

flake8 fixes

9b32c4d

chinmayapancholi13 added 2 commits August 1, 2017 17:55

added 'testPartialFit' function for 'TestPhrasesTransformer'

3a0977a

updated 'testPipeline' function for 'TestText2BowTransformer'

687c3d7

menshikh-iv reviewed Aug 10, 2017

View reviewed changes

chinmayapancholi13 added 8 commits August 15, 2017 17:03

updated skl api code as per PR 1473

7fa8632

updated code for transform function for HDP transformer

d42c877

updated tests as discussed in PR 1473

3037620

added examples for new models in ipynb

c52a0e2

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

3701eac

…into skl_api_gensim

unpinned sklearn version for running unit-tests

28a3aa3

updated 'Pipeline' initialization format

464a496

updated 'Pipeline' initialization format in ipynb

9d65fdf

menshikh-iv merged commit e27605f into piskvorky:develop Aug 18, 2017

menshikh-iv mentioned this pull request Aug 18, 2017

Pin sklearn version #1538

Merged

menshikh-iv mentioned this pull request Oct 2, 2017

Support sklearn pipeline interface. Continuing #932. #1123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] sklearn API for Gensim models #1462

[WIP] sklearn API for Gensim models #1462

chinmayapancholi13 commented Jul 5, 2017 •

edited

Loading

tmylk commented Jul 12, 2017

tmylk Jul 12, 2017

tmylk commented Jul 27, 2017 •

edited

Loading

menshikh-iv Aug 10, 2017

chinmayapancholi13 Aug 15, 2017

menshikh-iv commented Aug 18, 2017 •

edited

Loading

[WIP] sklearn API for Gensim models #1462

[WIP] sklearn API for Gensim models #1462

Conversation

chinmayapancholi13 commented Jul 5, 2017 • edited Loading

tmylk commented Jul 12, 2017

tmylk Jul 12, 2017

Choose a reason for hiding this comment

tmylk commented Jul 27, 2017 • edited Loading

menshikh-iv Aug 10, 2017

Choose a reason for hiding this comment

chinmayapancholi13 Aug 15, 2017

Choose a reason for hiding this comment

menshikh-iv commented Aug 18, 2017 • edited Loading

chinmayapancholi13 commented Jul 5, 2017 •

edited

Loading

tmylk commented Jul 27, 2017 •

edited

Loading

menshikh-iv commented Aug 18, 2017 •

edited

Loading