Fix the comment of Translation Matrix #1594

robotcator · 2017-09-19T14:34:20Z

No description provided.

…into mydevelop

menshikh-iv · 2017-09-20T05:15:02Z

gensim/models/translation_matrix.py


        Args:
-            `word_pair` (list): a list pair of words
+            `word_pairs` (list): a list pair of words
            `source_space` (Space object): source language space


train method use only word_pairs, what is source/target space here?

menshikh-iv · 2017-09-20T05:15:53Z

gensim/models/translation_matrix.py

-        self.source_space = Space.build(self.source_lang_vec, set(self.source_word))
-        self.target_space = Space.build(self.target_lang_vec, set(self.target_word))
+        self.source_word, self.target_word = zip(*word_pairs)
+        if self.translation_matrix is None:


But if I called train twice, in the second time I don't fit model.
Please remove this if

menshikh-iv · 2017-09-20T05:29:48Z

gensim/test/test_translation_matrix.py

        self.source_word_vec_file = datapath("EN.1-10.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt")
        self.target_word_vec_file = datapath("IT.1-10.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt")

-        with utils.smart_open(self.train_file, "r") as f:
-            self.word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f]
+        self.word_pairs = [("one", "uno"), ("two", "due"), ("three", "tre"),


Please use hanging indents

menshikh-iv · 2017-09-20T05:31:00Z

gensim/test/test_translation_matrix.py

+                           ("grape", "acino"), ("banana", "banana"), ("mango", "mango")
+        ]
+
+        self.test_word_pairs = [("ten", "dieci"), ("dog", "cane"), ("cat", "gatto")]


Remove ("dog", "cane") from self.word_pairs

gojomo · 2017-09-20T22:38:34Z

gensim/models/translation_matrix.py

@@ -91,7 +91,7 @@ def build(cls, lang_vec, lexicon=None):
        return Space(mat, words)

    def normalize(self):
-        """ normalized the word vector's matrix """
+        """ Normalized the word vector's matrix """


'Normalize…' (imperative rather than past-tense)

gojomo · 2017-09-21T00:51:39Z

gensim/models/translation_matrix.py

        model = super(TranslationMatrix, cls).load(*args, **kwargs)
        return model

    def apply_transmat(self, words_space):
        """
-        mapping the source word vector to the target word vector using translation matrix
+        Mapping the source word vector to the target word vector using translation matrix


'Map…' (imperative rather than '-ing' form)

gojomo · 2017-09-21T01:08:20Z

The handling of word_pairs in __init__() and train() now makes sense, thanks. The comments have been improved but still may benefit from a deep review for clarity/wording.

Though I know I requested the Doc2Vec-related example, in its current form the motivations/benefits are muddled. Really it shouldn't require a separate helper class (BackMappingTranslationMatrix), and the notebook section ("Tranlation Matrix Revisit") is hard-to-follow, and includes a bunch of improper practices. (For example: using an imbalanced set of docs for the mapping 'overlap'; using the slow-and-iffy dm_concat mode; calling train() multiple times with a sawtooth alpha progression, etc.)

The word-translation example can presumably be evaluated based on real datasets in the original context that motivated the approach, while the doc-vec example will need more novel design/evaluation – so I'd recommend splitting them into separate notebooks.

robotcator · 2017-09-21T02:38:59Z

Thanks. You do remind me the imbalanced set problem in the example. And the code for training document vector are borrowed from the doc2vec-imdb.ipynb and I will re-train the document vector.
As for the imbalanced data, how to sample documents to 'overlap' according to the sentiment or whether it is in the train or test set. (according to the sentiment to sample is more logical)

For the BackMappingTranslationMatrix class, I didn't find a good way to integrate this function into my TranslationMatrix class, so I separate this into two class. Because the word2vec and doc2vec has different method to access the vector.

for word2vec, use model[word] to get the word vector.
for doc2vec, use model.doc2vec['doc_tag'] to get the document vector.

If BackMappingTranslationMatrix was integrated into TranslationMatrix, I would handle them separately according to the type (ininstance method), is it appropriate?

I didn't catch that The word-translation example can be evaluated based on real datasets in the original context , can you please explain in more detail?

gojomo · 2017-09-21T19:20:01Z

I meant there are published papers about using word-vector transformations for language-translation - the original Google paper, the Dinu paper – so there are specific datasets & procedures to mimic – and similar results would indicate everything is working. The Doc2Vec use is novel so requires more experimentation/thought.

robotcator · 2017-09-22T01:47:16Z

The word pairs used in this experiment are extracted from the OPUS(http://opus.lingfil.uu.se/). The same as the Ninu's paper. I plot the vis to show the linear relationship between two language vector space and use the word translation to show this transformation works. More re-produced experiments from the Mikolov's and Ninu's paper would be fine to support this transformation. But I still can not find any experiment for language translation(Do it mean sentences translation ) from the two paper I mentioned here? Can you remind me If I miss something?

I added "unstable/experimental" warning tag in notebook explicitly for doc2vec transformation part as Ivan suggested.

I also found a paper (OFFLINE BILINGUAL WORD VECTORS,ORTHOGONALTRANSFORMATIONS AND THE INVERTED SOFTMAX) is related to this experiment. But I‘m just having a look and need dig deeper.

menshikh-iv · 2017-09-25T07:07:32Z

Thank you @robotcator for fast fixes.
Thank you @gojomo for a review:+1:

menshikh-iv · 2017-09-25T07:17:08Z

Need to fix some typos/pep8 issues in a notebook, but I can't wait for more, it's release time.

robotcator added 2 commits September 19, 2017 22:28

fix the comments

21841ad

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

cfc750b

…into mydevelop

robotcator changed the title ~~Fix the comment of Tr~~ Fix the comment of Translation Matrix Sep 19, 2017

robotcator mentioned this pull request Sep 19, 2017

[MRG] Implement 'Translation Matrix' #1434

Merged

remove print function

54ce6ab

menshikh-iv suggested changes Sep 20, 2017

View reviewed changes

robotcator added 3 commits September 20, 2017 14:29

update the notebook

be73aac

fix the train method

8d22786

remove some words for sample

dc9418e

gojomo reviewed Sep 20, 2017

View reviewed changes

gojomo reviewed Sep 21, 2017

View reviewed changes

fix the tense

97b32c2

add warning for the translation matrix revist part

423ca98

menshikh-iv added the style checking label Sep 25, 2017

menshikh-iv merged commit 33a3ef2 into piskvorky:develop Sep 25, 2017

robotcator deleted the mydevelop branch September 25, 2017 12:09

menshikh-iv removed the style checking label Oct 6, 2017

robotcator mentioned this pull request Nov 17, 2017

Fixing the comment in Implement 'Translation Matrix' #1593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the comment of Translation Matrix #1594

Fix the comment of Translation Matrix #1594

robotcator commented Sep 19, 2017

menshikh-iv Sep 20, 2017

menshikh-iv Sep 20, 2017

menshikh-iv Sep 20, 2017

menshikh-iv Sep 20, 2017

gojomo Sep 20, 2017

gojomo Sep 21, 2017

gojomo commented Sep 21, 2017

robotcator commented Sep 21, 2017 •

edited

Loading

gojomo commented Sep 21, 2017

robotcator commented Sep 22, 2017 •

edited

Loading

menshikh-iv commented Sep 25, 2017

menshikh-iv commented Sep 25, 2017

Fix the comment of Translation Matrix #1594

Fix the comment of Translation Matrix #1594

Conversation

robotcator commented Sep 19, 2017

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

gojomo Sep 20, 2017

Choose a reason for hiding this comment

gojomo Sep 21, 2017

Choose a reason for hiding this comment

gojomo commented Sep 21, 2017

robotcator commented Sep 21, 2017 • edited Loading

gojomo commented Sep 21, 2017

robotcator commented Sep 22, 2017 • edited Loading

menshikh-iv commented Sep 25, 2017

menshikh-iv commented Sep 25, 2017

robotcator commented Sep 21, 2017 •

edited

Loading

robotcator commented Sep 22, 2017 •

edited

Loading