Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 1779 #1843

Closed
wants to merge 3 commits into from
Closed

Fix 1779 #1843

wants to merge 3 commits into from

Conversation

saroufimc1
Copy link

Addressing issue #1779. Correct the assignment of model.wv.syn0_ngrams after trimming of unused ngrams (model.wv.syn0_ngrams.shape[0] <= self.bucket).

It is 2x slower on pre-trained Wikipedia models from Facebook's fastText than if we do not trim unused ngrams. Therefore, I recommend dealing with #1261.

For the test case, we need an actual small Facebook FastText model saved in bin format and make sure that we have model.wv.syn0_ngrams.shape[0] <= self.bucket.

Related to case #1787.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saroufimc1.

Need to merge #1777 (refactoring of *2vec API) and fix this code after.

@manneshiva please have a look!

@@ -1,8 +1,8 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: Jayant Jain <jayantjain1992@gmail.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure about the header. Feel free to take the code and do whatever you want with it.

@menshikh-iv
Copy link
Contributor

@saroufimc1 #1777 merged, please resolve merge conflict first

@menshikh-iv
Copy link
Contributor

Ping @saroufimc1, when you plan to finish this PR?

@menshikh-iv
Copy link
Contributor

Unfortunately, you merged conflict incorrectly (look at current state of gensim/models/wrappers/fasttext.py from develop), your code edition should be in https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/deprecated/fasttext_wrapper.py

Are you planning to finish PR?

@saroufimc1
Copy link
Author

saroufimc1 commented Feb 16, 2018

@menshikh-iv Sorry about that, I am not familiar with all the changes of the source code that happened in the meantime. Is it possible that you take my code and do the merging yourself since you are more familiar with the changes? It would be much faster I guess.

By the way, I am still not convinced that the way to remove all subwords not used etc. This slows down the loading a lot.

@menshikh-iv
Copy link
Contributor

@saroufimc1 no problem, I close this with "almost complete" label, we will return to this later and fix it, thanks!

CC: @manneshiva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants