Fix `D2VTransformer.fit_transform`. Fix #1834 #1845

karshd3v · 2018-01-18T16:04:46Z

No description provided.

menshikh-iv · 2018-01-19T08:44:55Z

Thanks @Utkarsh-Mishra-CIC,
this isn't very good fix, but possible for this situation. I see same conversion in tests (before training model).

@chinmayapancholi13 can you comment this?

P/S @Utkarsh-Mishra-CIC please merge fresh develop to current PR

menshikh-iv

CC @chinmayapancholi13

menshikh-iv · 2018-02-05T11:39:48Z

gensim/sklearn_api/d2vmodel.py

@@ -63,8 +64,9 @@ def fit(self, X, y=None):
        Fit the model according to the given training data.
        Calls gensim.models.Doc2Vec
        """
+        d2v_sentences = [doc2vec.TaggedDocument(words[0], [i]) for i, words in enumerate(X)]


Can you check X, this can be already in TaggedDocument format (no need to convert it directly)

Why words[0]? If X is iterable of the list of tokens, words[0] will be one token only.

Please add tests for fit_transform

I can check for X like following: all(isinstance(word, doc2vec.TaggedDocument) for word in X) but that isn't useful for fit_transform as the transform method of D2VTransformer requires only a list of list, so the input format should only be list of documents like train_input not in TaggedDocument format.

Thanks for pointing that out, i'll add that in next commit.

Added in the latest commit.

chinmayapancholi13 · 2018-02-08T17:37:03Z

Hi @menshikh-iv!
Apologies for not being able to devote time for this till now. I'll try to resolve this and get back to you by this weekend. I hope that's ok. :)

menshikh-iv · 2018-02-08T18:07:46Z

Hey @chinmayapancholi13, can you review current approach?

menshikh-iv · 2018-02-08T18:16:22Z

ping @Utkarsh-Mishra-CIC, can you answer my questions #1845 (comment)?

chinmayapancholi13 · 2018-02-13T00:58:36Z

Hey @menshikh-iv

I went through the discussion on issue #1834 as well as the comments on this PR. When I had worked on sklearn API, I remember that at the time we had decided to not implement fit_transform(). But if we are planning to have it now, the approach used in this PR (of modifying the fit() method to have a consistent format) looks good to me.

menshikh-iv · 2018-02-14T11:33:53Z

gensim/sklearn_api/d2vmodel.py

@@ -63,8 +64,12 @@ def fit(self, X, y=None):
        Fit the model according to the given training data.
        Calls gensim.models.Doc2Vec
        """
+        if all(isinstance(word, doc2vec.TaggedDocument) for word in X):


Check only first element (that's enough).

menshikh-iv · 2018-02-14T11:35:15Z

gensim/test/test_sklearn_api.py

@@ -831,6 +831,22 @@ def testTransform(self):
        self.assertEqual(matrix.shape[0], 1)
        self.assertEqual(matrix.shape[1], self.model.size)

+    def testFitTransform(self):
+        numpy.random.seed(0)


This is global seeding (affect on interpreter state, not only for this test), please don't use it.

menshikh-iv · 2018-02-14T11:36:08Z

gensim/test/test_sklearn_api.py

+        numpy.random.seed(0)
+        model = D2VTransformer(min_count=1)
+
+        #fit and transform multiple documents


PEP8: should be # (space between # and text), here and later.

karshd3v · 2018-02-14T14:33:14Z

Made the requested changes.

menshikh-iv · 2018-02-16T06:40:03Z

Thanks @Utkarsh-Mishra-CIC, congratz with the first contribution: 1st_place_medal:!

* Fix D2VTransformer.fit_transform\(piskvorky#1834\) * Add check for TaggedDocument * Add test for D2VTransformer fit_transform * Add test and check d2vtransformer

Fix D2VTransformer.fit_transform\(piskvorky#1834\)

84d1c37

karshd3v mentioned this pull request Jan 18, 2018

D2VTransformer.fit_transform doesn't work #1834

Closed

Merge fresh 'develop' into fix-1834

3080cac

menshikh-iv changed the title ~~Fix D2VTransformer.fit_transform(#1834)~~ Fix D2VTransformer.fit_transform. Fix #1834 Feb 1, 2018

menshikh-iv suggested changes Feb 5, 2018

View reviewed changes

karshd3v added 2 commits February 9, 2018 03:53

Add check for TaggedDocument

4c01cd2

Add test for D2VTransformer fit_transform

be2db9b

menshikh-iv suggested changes Feb 14, 2018

View reviewed changes

Add test and check d2vtransformer

e36fa9d

menshikh-iv merged commit 8759282 into piskvorky:develop Feb 16, 2018

karshd3v deleted the fix-1834 branch February 16, 2018 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `D2VTransformer.fit_transform`. Fix #1834 #1845

Fix `D2VTransformer.fit_transform`. Fix #1834 #1845

karshd3v commented Jan 18, 2018

menshikh-iv commented Jan 19, 2018 •

edited

Loading

menshikh-iv left a comment

menshikh-iv Feb 5, 2018

karshd3v Feb 8, 2018 •

edited

Loading

chinmayapancholi13 commented Feb 8, 2018

menshikh-iv commented Feb 8, 2018

menshikh-iv commented Feb 8, 2018 •

edited

Loading

chinmayapancholi13 commented Feb 13, 2018

menshikh-iv Feb 14, 2018

menshikh-iv Feb 14, 2018

menshikh-iv Feb 14, 2018

karshd3v commented Feb 14, 2018

menshikh-iv commented Feb 16, 2018

Fix D2VTransformer.fit_transform. Fix #1834 #1845

Fix D2VTransformer.fit_transform. Fix #1834 #1845

Conversation

karshd3v commented Jan 18, 2018

menshikh-iv commented Jan 19, 2018 • edited Loading

menshikh-iv left a comment

Choose a reason for hiding this comment

menshikh-iv Feb 5, 2018

Choose a reason for hiding this comment

karshd3v Feb 8, 2018 • edited Loading

Choose a reason for hiding this comment

chinmayapancholi13 commented Feb 8, 2018

menshikh-iv commented Feb 8, 2018

menshikh-iv commented Feb 8, 2018 • edited Loading

chinmayapancholi13 commented Feb 13, 2018

menshikh-iv Feb 14, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 14, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 14, 2018

Choose a reason for hiding this comment

karshd3v commented Feb 14, 2018

menshikh-iv commented Feb 16, 2018

Fix `D2VTransformer.fit_transform`. Fix #1834 #1845

Fix `D2VTransformer.fit_transform`. Fix #1834 #1845

menshikh-iv commented Jan 19, 2018 •

edited

Loading

karshd3v Feb 8, 2018 •

edited

Loading

menshikh-iv commented Feb 8, 2018 •

edited

Loading