-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed bug in loss computation for Word2Vec with hierarchical softmax #3397
Conversation
Those are much-more realistic loss-over-time curves, so on that basis alone, I suspect this is a good fix for that code-branch. Thanks! But, we'd certainly want to fix anywhere the same issue recurs - & it looks like the Can you run a similar set of 'before' tests using alternate params Then, try your fix also at the (Running at just a smaller number of more-typical |
I have some good news, it looks like the same problem was also in CBOW + HS before change:CBOW + HS after change:SG+NEG:CBOW+NEG:I committed the additional change for the CBOW + HS. |
Thank you! Assuming automated tests-in-progress pass (& on the off chance they fail, it's probably due to something other than your tiny changes), these fixes look good for integration to me. |
Many thanks! One small change for code, a giant leap for loss correctness. I do wonder how the |
Looks good to me! Thank you for your work and the well-crafted PR @TalIfargan. |
I found an issue with loss computation for Word2Vec when using Skip-Gram (
sg=1
) with Hierarchical Softmax (hs=1
).While training different models and trying to understand their loss I have encountered the same problem that was described in this Why does the loss of Word2Vec model trained by gensim at first increase for a few epochs and then decrease? SO question. I guess that if I try to assign this PR to an open issue that will be #2617.
I decided not to elaborate too much here because the author of the SO question has given all the information needed in order to understand and reproduce the results and I guess that @gojomo is very aware to the different details already. In addition I am actually changing only 3 characters in the code and they will only affect very specific and limited component of the code, which is the loss calculation when using Word2Vec Skip-Gram (
sg=1
) with Hierarchical Softmax (hs=1
) and settingcompute_loss=True
.In short, the loss appears to behave in a weird and unpredictable way (essentially spiking up somewhere in the beginning of the training and not converging for different vector sizes as expected) but the word vectors are improving over time according to different score metrics I have tested. I've plotted the loss relative to epoch for different vector sizes in order to show what I mean.
After some investigation of the code I have found that in the loss calculation there was a redundant
-1
on line 129 onword2vec_inner.pyx
file:https://github.com/RaRe-Technologies/gensim/blob/a435f24fe25e17f473e71af3468660512c2606cb/gensim/models/word2vec_inner.pyx#L127-L133
This
-1
does not align with the computation of the loss that was mentioned in word2vec Parameter Learning Explained.
After getting rid of this
-1
I was able to get the expected results from the loss under the exact same conditions when running the previous experiments with the problematic results.