cPickle.UnpicklingError: unpickling stack underflow #1447

loretoparisi · 2017-06-23T14:49:16Z

I get this error while loading wiki.en.vec from FastText pre-trained Word2Vec model. See here for this model.

2017-06-23 16:41:40,834 : INFO : loading Word2Vec object from /Volumes/Dataset/word2vec/wiki.en/wiki.en.vec
Traceback (most recent call last):
  File "loadlyricsmodel.py", line 45, in <module>
    model = Word2Vec.load( model_filepath )
  File "/Users/loretoparisi/Documents/Projects/word2vec/.env/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1382, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "/Users/loretoparisi/Documents/Projects/word2vec/.env/lib/python2.7/site-packages/gensim/utils.py", line 271, in load
    obj = unpickle(fname)
  File "/Users/loretoparisi/Documents/Projects/word2vec/.env/lib/python2.7/site-packages/gensim/utils.py", line 935, in unpickle
    return _pickle.loads(f.read())
cPickle.UnpicklingError: unpickling stack underflow

loaded with

model = Word2Vec.load( model_filepath )

I'm using

gensim-2.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

The text was updated successfully, but these errors were encountered:

gojomo · 2017-06-23T19:52:49Z

Word2Vec.load() only loads models saved from gensim. (It uses Python pickling.)

I believe that .vec file is in the format used by the original Google word2vec.c (and now FastText) for its top-level vectors, so KeyedVectors.load_word2vec_format() may work, perhaps with a binary=False parameter.

The method gensim.models.wrappers.fasttext.FastText.load_fasttext_format() may also be relevant to bring in ngrams for OOV word vector synthesis may by of interest too... but I'm not sure if it's yet doing the right thing in the released gensim, as compared to PR-in-progress #1341.

menshikh-iv · 2017-06-26T05:01:32Z

@jayantj @prakhar2b wdyt?

prakhar2b · 2017-06-26T07:22:24Z

@gojomo yes, KeyedVectors.load_word2vec_format() will definitely work here, and also binary=False is default parameter.

As for OOV word synthesis, what do you mean by not sure if it's yet doing the right thing in the released gensim. I think for OOV, we need n-gram informations which is provided in .bin file.

As of now, gensim.models.wrappers.fasttext.FastText.load_fasttext_format() is used to load complete model for this purpose using both vec and bin files. With PR#1341, we will need only bin file, rest all functionalities will remain same I believe.

cc @jayantj @menshikh-iv

jayantj · 2017-06-26T07:50:55Z

Yes, with the .bin AND the .vec file, you can load the complete model using -

from gensim.models.wrappers.fasttext import FastText
model = FastText.load_fasttext_format('/path/to/model')  # without the .bin/.vec extension

With the .vec file, you can load only the word vectors (and not the out-of-vocab word information) using -

from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('/path/to/model.vec')  # with the .vec extension

loretoparisi · 2017-06-26T14:22:38Z

@jayantj Thank, let me try first with the load_fasttext_format and FastText wrapper

gojomo · 2017-06-28T11:22:02Z

@prakhar2b My "not sure" comment was regarding to some discussion I saw on another issue or PR in progress, perhaps the one that's also discussing whether the discarding-of-untrained-ngrams is a necessary optimization – I had the impression our calculation might be diverging from the original FB fasttext on some (perhaps just OOV) words. (And even if that's defensible, because the untrained ngrams are still just random vectors, it might not be the 'right thing' overall because it may violate user expectations that whether loaded into original FT code, or gensim FT code, OOV words get the same vectors from the same loaded model.)

piskvorky · 2017-08-31T18:14:36Z

We definitely want to follow whatever the original FT does -- the path of least surprise for anyone migrating / trying both.

piskvorky assigned menshikh-iv Jun 23, 2017

piskvorky added the bug Issue described a bug label Aug 31, 2017

menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Oct 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cPickle.UnpicklingError: unpickling stack underflow #1447

cPickle.UnpicklingError: unpickling stack underflow #1447

loretoparisi commented Jun 23, 2017

gojomo commented Jun 23, 2017

menshikh-iv commented Jun 26, 2017

prakhar2b commented Jun 26, 2017

jayantj commented Jun 26, 2017

loretoparisi commented Jun 26, 2017

gojomo commented Jun 28, 2017

piskvorky commented Aug 31, 2017 •

edited

Loading

cPickle.UnpicklingError: unpickling stack underflow #1447

cPickle.UnpicklingError: unpickling stack underflow #1447

Comments

loretoparisi commented Jun 23, 2017

gojomo commented Jun 23, 2017

menshikh-iv commented Jun 26, 2017

prakhar2b commented Jun 26, 2017

jayantj commented Jun 26, 2017

loretoparisi commented Jun 26, 2017

gojomo commented Jun 28, 2017

piskvorky commented Aug 31, 2017 • edited Loading

piskvorky commented Aug 31, 2017 •

edited

Loading