Should Doc2Vec.load_word2vec_format return a Doc2Vec instance? #322

frnsys · 2015-04-11T16:37:06Z

Currently Doc2Vec.load_word2vec_format returns a Word2Vec object, shouldn't it be a Doc2Vec object?

The text was updated successfully, but these errors were encountered:

piskvorky · 2015-04-11T16:38:06Z

topinsky · 2015-09-03T21:12:26Z

Previously, Doc2Vec didn't have separate container for document vectors (docvecs) and probably that's why returning word2vec object was not a big issue. But now if it returns Word2Vec then it loses all information about docvecs... =(

gojomo · 2015-09-03T22:17:22Z

It's not clear to me what the most useful behavior would be in this case.

Do you want to influence a Doc2Vec session with reused word vectors? In such a case, you might be able to cobble together the desired effect using a multi-step process that at some point uses the intersect_word2vec_format() method to bring in some/all word vectors. (Since that just modifies an existing Doc2Vec model, you'd have the right kind of object at the end.)

Or is it that you saved a prior-version Doc2Vec model in _word2vec_format, so it also has doc vectors mixed with words in that format, and you want to convert it forward? Since many conventions for naming the doc-vecs are possible, that'd require some user-specific coding, I think, but still might be possible leveraging intersect_word2vec_format(), and then copying the vectors you know (by your own naming convention) are doc vectors into the DocvecsArray component.

topinsky · 2015-09-03T22:25:13Z

I don't get your answer. It's little bit unclear for me.
intersect_word2vec_format() -- what is that?

My use case is simple.
I built Doc2Vec and wanted to save it in binary format.
I did it as before by using save_word2vec_format method.

And now I want to load this binary format.
But if I use load_word2vec_format then I will get Word2Vec object.

How can I do that ?

gojomo · 2015-09-03T22:49:04Z

I recommend you just use the plain (gensim-native) save() and load() methods. They'll save and load the full model.

(The word2vec.c format was only meant for string-keyed vectors – which the docvecs won't be if you're being maximally memory efficient. And it never saved all the model information. So I think you'd only want to use it if needing to maintain compatibility with other code.)

intersect_word2vec_format() is a method on Word2Vec that lets you load word2vec.c-format word vector values into an existing Word2Vec model, for only those words already in the model's vocabulary. (It replaces the model's vector values with those in the supplied file.) It's experimental but might support some of the reasons people would want to load_word2vec_format() into a Doc2Vec model.

topinsky · 2015-09-03T23:08:01Z

Thank You

ghost mentioned this issue Aug 2, 2015

Identical topics #416

Closed

menshikh-iv closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should Doc2Vec.load_word2vec_format return a Doc2Vec instance? #322

Should Doc2Vec.load_word2vec_format return a Doc2Vec instance? #322

frnsys commented Apr 11, 2015

piskvorky commented Apr 11, 2015

topinsky commented Sep 3, 2015

gojomo commented Sep 3, 2015

topinsky commented Sep 3, 2015

gojomo commented Sep 3, 2015

topinsky commented Sep 3, 2015

Should Doc2Vec.load_word2vec_format return a Doc2Vec instance? #322

Should Doc2Vec.load_word2vec_format return a Doc2Vec instance? #322

Comments

frnsys commented Apr 11, 2015

piskvorky commented Apr 11, 2015

topinsky commented Sep 3, 2015

gojomo commented Sep 3, 2015

topinsky commented Sep 3, 2015

gojomo commented Sep 3, 2015

topinsky commented Sep 3, 2015