-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Poincare model keyedvectors #1700
Conversation
…nce and vector_distance_batch
Nice catch for the integer division, fixed. |
Short summary and rationale for the changes to
The PR also adds missing tests for some of the older |
004b572
to
73ed696
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general - very nice work 💣 🔥
Only several questions (and I'll fix docstrings / PEP8 after your commits).
@@ -0,0 +1,187 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better write this code in ipython notebook (or maybe move to docs/notebooks
and import from "under the feet").
gensim.models
isn't a suitable place for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree gensim.models
isn't a good place for it. I don't want to have it in an ipython notebook or docs/notebooks
though since that would mean a user can't import and use it, and I think it is definitely useful for a user. Do you think creating a new package poincare
in gensim/
would be a good idea? Other models do this too (e.g. topic_coherence
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is a very good idea, after refactoring, topic_coherence
will be moved/renamed in "deep" of gensim.modules
(coherence contains only inner/secondary functions, not public API), but in your case, API is public.
I agree about imports (it's really untrivial, how to import from docs/notebooks
if you in /randomfolder
, only manually with importlib
I think.
We have many viz helpers (produced by @parulsethi on GSoC) + now Parul works on very nice viz for topic models. Potentially, we can create the distinct repository (like gensim-data
) and move all viz helpers, or, as you suggest, create submodule gensim.viz
and move all viz stuff (not only your Poincare viz).
Hard question, I don't know what's better right now.
WDYT @piskvorky @janpom @parulsethi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A submodule gensim.viz
sounds good to me, keeping in mind we might have future visualizations too. I don't have a good enough perspective on this though, so whatever you decide is okay with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @menshikh-iv , so is it okay if I create a gensim.viz
package for this and any future gensim visualizations, and move the poincare visualization there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gensim.viz
submodule would be useful for #1616 also in future, and the long code blocks of network graph/dendrogram could also be wrapped up in a function under this module so that those visualizations can be produced simply using the imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayantj sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for the feedback @parulsethi @menshikh-iv
Pushed changes with a new gensim.viz
package.
@@ -514,6 +516,8 @@ def train(self, epochs, batch_size=10, print_every=1000, check_gradients_every=N | |||
""" | |||
if self.workers > 1: | |||
raise NotImplementedError("Multi-threaded version not implemented yet") | |||
# Some divide-by-zero results are handled explicitly | |||
old_settings = np.seterr(divide='ignore', invalid='ignore') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? You mean that division by zero is expected and you process this situation in code, I'm correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it happens in PoincareBatch
. I'm setting it here to avoid repeated calls to np.seterr
gensim/models/poincare.py
Outdated
|
||
Parameters | ||
---------- | ||
node : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always str
(or int possible too)? This question more global (about all methods) that pass node
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be an int
too in theory, depending on what the vocab keys are. The most common case is str
though. How would you prefer to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe str
or int
everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also made some other changes to docstrings for more clarity.
Some completely unrelated tests seem to be failing on travis ( |
@menshikh-iv we need this resolved & finished -- can you have a look? Cheers. |
This branch includes the changes to
KeyedVectors
andPoincareKeyedVectors
for thePoincareModel
.Includes commits from #1696 , to be reviewed and merged after #1696
TODOs:
KeyedVector
methods likemost_similar
toPoincareKeyedVectors
KeyedVectors
intoKeyedVectorsBase
andEuclideanKeyedVectors
for cleaner class hierarchyPoincareKeyedVectors