-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc2vec-lee.ipynb results ... not even close #1088
Comments
Open-ended questions/discussion that are not bug-reports or feature-requests should go to the project discussion list at https://groups.google.com/forum/#!forum/gensim rather than this issues-tracker. So please post your question there. (When you do so, it'd be helpful to make clear whether you've tried running the code in a Jupyter notebook itself and had the same problem, and what gensim version you're using, and what exact results or logged output you are seeing rather than what you expect.) |
Looks like a (little incomplete) bug report to me. |
Reopening, as it does seem that our updating of Doc2Vec defaults made the examples in this notebook less effective and stable - see discussion thread at https://groups.google.com/d/msg/gensim/bs77ke1Zun0/9lrMo_w0CAAJ I believe upping the Thanks, @johncleveland, for catching and reporting this! |
This may not be the right place for this, but if this is the original paragraph vectors paper, I believe there have been some serious problems with the reproducibility of those findings. In Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews Mikolov even has a footnote that explains that the results were not reproducible. |
Yes, you can find posts across the net in a bunch of places from people who've been frustrated trying to reproduce the PV paper's error rates on the same original datasets, and a few comments by Mikolov (like that footnote) implying Le made a mistake in result-reporting. Here, it's just a matter of our demo, on a different much smaller dataset, not behaving the same across some other code changes. |
For this github tutorial: gensim/docs/notebooks/doc2vec-lee.ipynb
I have copied the code verabtim and I have been unable to reproduce any near the 95% rate.
collections.Counter(ranks) #96% accuracy
Counter({0: 292, 1: 8})
I have used python 2.7.12, 2.7.13, 3.5 on both Windows 10 and Ubuntu 16.10.
I have also had a friend try it on his Windows system. My results are all over the place.
What could possibly be the problem. I am just copy pasting?
Thanks
The text was updated successfully, but these errors were encountered: