-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating unified base class for all topic models, #938 #946
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely a step in the right direction.
Please add tests for the print_topic
method and changelog.
@@ -0,0 +1,10 @@ | |||
class BaseModel(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseTopicModel is more appropriate
Please rebase off develop and add HDPModel |
|
I was not sure about what tests to add, please take a close look on the test files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New testclass requested.
@@ -10,10 +10,11 @@ Changes | |||
* Change export_phrases in Phrases model. Fix issue #794 (@AadityaJ, | |||
[#879](https://github.com/RaRe-Technologies/gensim/pull/879)) | |||
- bigram construction can now support multiple bigrams within one sentence | |||
* Fixed issue #838, RuntimeWarning: overflow encountered in exp (@markroxor, [#895](https://github.com/RaRe-Technologies/gensim/pull/895)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these changes?
Expect to see just 1 line added about the new fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was fixing my previous hyperlinks, but alright. I'll add just one line no issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes to your prev links are ok. You can leave them in.
@@ -253,6 +255,12 @@ def testShowTopics(self): | |||
self.assertTrue(isinstance(k, six.string_types)) | |||
self.assertTrue(isinstance(v, float)) | |||
|
|||
#testing print_topic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create new test test_print_topic in new class TestBaseTopicModel.
TestLdaModel, testLSI and testHDP should inherit from TestBaseTopicModel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the tests okay (the assertions of variable type)?Should I also create a unified test_basemodel.py to handle common tests in all the models like testShowTopic and testShowTopics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
BTW |
@@ -193,7 +195,7 @@ def get_Elogbeta(self): | |||
# endclass LdaState | |||
|
|||
|
|||
class LdaModel(interfaces.TransformationABC): | |||
class LdaModel(interfaces.TransformationABC,basemodel.BaseTopicModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight code style issue. Should be interfaces.TransformationABC, basemodel.BaseTopicModel
.
@@ -221,7 +223,7 @@ def merge(self, other, decay=1.0): | |||
#endclass Projection | |||
|
|||
|
|||
class LsiModel(interfaces.TransformationABC): | |||
class LsiModel(interfaces.TransformationABC,basemodel.BaseTopicModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. You should probably consider setting up a flake8 checks plugin in your editor. Will help you eliminate those nitpicks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions. I will keep that in mind.
created unified TestBaseModel and aded test for printTopic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there! See inline comments
@@ -575,9 +570,6 @@ def __init__(self, dictionary=None, topic_data=None, topic_file=None, style=None | |||
|
|||
self.style = style | |||
|
|||
def print_topics(self, num_topics=10, num_words=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this method removed? it is not inherited from anywhere.
It should remain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is inherited from the BaseTopicModel class just like lsimodel and ldamodel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is in class HDPFormatter, not HDP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I will restore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert all changes to HdpTopicFormatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I've already altered the HdpTopicFormatter class. To unify the model should I remove it's print_topic method and inherit it from its parent class basemodel
instead?
@@ -47,7 +47,6 @@ def testfile(): | |||
return os.path.join(tempfile.gettempdir(), 'gensim_models.tst') | |||
|
|||
|
|||
|
|||
class TestHdpModel(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it inherit from TestBaseTopicModel too? Feel free to add the missing methods
@@ -612,10 +604,9 @@ def show_topic_terms(self, topic_data, num_words): | |||
def format_topic(self, topic_id, topic_terms): | |||
if self.STYLE_GENSIM == self.style: | |||
fmt = ' + '.join(['%.3f*%s' % (weight, word) for (word, weight) in topic_terms]) | |||
fmt = 'topic %i: %s' % (topic_id, fmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert all changes to HdpTopicFormatter
else: | ||
fmt = '\n'.join([' %20s %.8f' % (word, weight) for (word, weight) in topic_terms]) | ||
fmt = 'topic %i:\n%s' % (topic_id, fmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for passing the testShowTopics test, all the other models throw data in this format. Should I exclude this test too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a different class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about testShowTopics
test? should I override this too for hdpmodel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert all changes to HdpTopicFormatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the Travis build will fail. I have added all the other changes but cannot commit until the tests pass on my local machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what tests fail on your local machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testPrintTopic and testPrintTopics since the return type of show_topics
in hdpmodel does not corresponds to that of ldamodel and lsimodel. To create a unified basemodel the return type should be same so that the unified tests can pass.
Please remove other people's commits from this PR |
def testShowTopic(self): | ||
# TODO create show_topic in HdpModel and then test | ||
|
||
if self.__class__.__name__ == "TestHdpModel": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please override testShowTopic inside TestHdpModel and skip it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this if statement
@@ -47,24 +48,12 @@ def testfile(): | |||
return os.path.join(tempfile.gettempdir(), 'gensim_models.tst') | |||
|
|||
|
|||
|
|||
class TestHdpModel(unittest.TestCase): | |||
class TestHdpModel(unittest.TestCase, test_basemodel.TestBaseTopicModel): | |||
def setUp(self): | |||
self.corpus = mmcorpus.MmCorpus(datapath('testcorpus.mm')) | |||
self.class_ = hdpmodel.HdpModel | |||
self.model = self.class_(corpus, id2word=dictionary) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the override here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
32b7f58
to
096faa5
Compare
Passed finally!! |
@@ -0,0 +1,16 @@ | |||
class BaseTopicModel(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Python classes must inherit from object (new-style classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
fmt = (topic_id,fmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8: space after comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
|
||
import six | ||
|
||
class TestBaseTopicModel(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests should inherit from unittest.TestCase
(or at least object
, if you don't want this to be a real test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want it to be a real test so adding object
instead.
Hi @markroxor There is a problem with the new tests on windows. It is blocking the release. See https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/216bwdmb2gqwkme0 |
Sent PR #969 |
Attempt to fix #938. Waiting for moderator review to proceed with further commits.