-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RandomState (Fix to issue #113) breaks backwards compatibility with old LDA models #1082
Comments
I am interested in working on this bug, how can I start, what does "barking pull request" mean? |
@ibrahimsharaf The pull request that introduced this error is 2e0ed26 |
@tmylk should we use |
@ibrahimsharaf That line is used to load saved state. It is different from random_state in this issue |
@ibrahimsharaf It is strange that it can't be reproduced - trying to use an unpickled object that doesn't have a required field should cause an error. what is your code to reproduce? |
Current status: awaiting a code fix |
@tmylk @menshikh-iv On running the code below : from gensim import corpora, models
#save with version <=0.13.1
# texts = [['human', 'interface', 'computer'],
# ['survey', 'user', 'computer', 'system', 'response', 'time'],
# ['eps', 'user', 'interface', 'system'],
# ['system', 'human', 'system', 'eps'],
# ['user', 'response', 'time'],
# ['trees'],
# ['graph', 'trees'],
# ['graph', 'minors', 'trees'],
# ['graph', 'minors', 'survey']]
# dictionary = corpora.Dictionary(texts)
# corpus = [dictionary.doc2bow(text) for text in texts]
# model = models.ldamodel.LdaModel(corpus, num_topics=3, id2word = dictionary, passes=20)
# print(model.print_topics(num_topics=3, num_words=3))
# model.save('lda_model_saved1')
#load with version >=0.13.2
load_model = models.LdaModel.load('lda_model_saved1')
print(load_model.print_topics(num_topics=2, num_words=3)) I get the following error :
So, I believe, I am able to reproduce the issue. However, I wanted to verify the solution that we want here. So while saving, we save a separate file on disk for |
You are right, @chinmayapancholi13. Just add |
@menshikh-iv However, when we save an LDA model, only 4 files get created currently : model_name, model_name.expElogbeta.npy, model_name.id2word and model_name.state |
@menshikh-iv Great! I'll submit a PR for this shortly then. |
@menshikh-iv For pre-0.13.2 versions, two files are created while saving the model : model_name and model_name.state. In the post-0.13.2 versions, at the time of loading, the
As can be inferred from the error log above, this is because |
Yes, |
@tmylk Yes. We are able to load from the main pickle for pre-0.13.2 versions. So,
only when loading a model which has been saved using a post-0.13.2 version.
This is working for both the cases i.e. when the model had been saved using a pre-0.13.2 model ( |
Looks goodin theory, but waiting for the unit tests :) |
@tmylk Great! Then I'll try to address both these issues (one of |
Fixed in 1327 |
LDA models saved before version 0.13.2 can not be used in version 0.13.2 and up because they do not contain
random_state
variable. This should be a pretty trivial fix inload
functionality though.The breaking PR is this one
2e0ed26
The text was updated successfully, but these errors were encountered: