Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix train error of ConcatenatedDoc2Vec in the notebook of doc2vec-IMDB #1377

Merged
merged 25 commits into from
Jul 7, 2017
Merged
Changes from 18 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1aa3f33
fix the compatibility between python2 & 3
robotcator Mar 17, 2017
24e6331
Merge https://github.com/RaRe-Technologies/gensim into fix-word2vec-n…
robotcator Mar 18, 2017
f6f571f
require explicit corpus size, epochs for train()
gojomo Feb 9, 2017
5e9529b
make all train() calls use explicit count, epochs
gojomo Feb 9, 2017
5c24a90
add tests to make sure that ValueError is indeed thrown
robotcator Mar 23, 2017
c89f285
update test
robotcator Mar 24, 2017
10ff8a5
fix the word2vec's reset_from()
robotcator Mar 25, 2017
a6312ca
Merge branch 'fix-word2vec' into fix-word2vec-notebook
robotcator Mar 29, 2017
be5216a
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Mar 29, 2017
504bd09
require explicit corpus size, epochs for train()
gojomo Feb 9, 2017
43f9689
make all train() calls use explicit count, epochs
gojomo Feb 9, 2017
49e3d00
update notebooks
robotcator Mar 29, 2017
c9eab32
fix some error
robotcator Mar 29, 2017
8024eb5
fix test error
robotcator Mar 29, 2017
d3562b6
Merge branch 'test-word2vec' of https://github.com/robotcator/gensim …
robotcator Apr 9, 2017
ff93cdf
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator May 24, 2017
67f0367
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Jun 1, 2017
8a6098a
fix the train error of ConcatenatedDoc2Vec
robotcator Jun 1, 2017
04cf9cd
update the ConcatenatedDoc2Vec class
robotcator Jun 2, 2017
09a2691
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Jun 4, 2017
623add0
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Jun 5, 2017
b365e2a
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Jun 5, 2017
2e15945
update the parameters
robotcator Jun 5, 2017
5306c0a
rerun all the cells
robotcator Jun 6, 2017
2aaecff
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator Jun 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/notebooks/doc2vec-IMDB.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,10 @@
" duration = 'na'\n",
" train_model.alpha, train_model.min_alpha = alpha, alpha\n",
" with elapsed_timer() as elapsed:\n",
" train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n",
" if not isinstance(train_model, ConcatenatedDoc2Vec):\n",
" train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n",
" else:\n",
" train_model.train(doc_list)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler and more robust fix would be to change the ConcatenatedDoc2Vec class, in test_doc2vec.py, to make its (no-op) train() match the new train() parameters-signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. If the train() method is modified, the total_examples and epochs should be provided. But the ConcatenatedDoc2Vec class has no attribute 'corpus_count'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call doesn't have to use train_model.corpus_count from inside the model - it can just use len(doc_list). And since the outside loop is handling the multiple passes, the epochs argument should be 1.

" duration = '%.1f' % elapsed()\n",
" \n",
" # evaluate\n",
Expand Down