Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating unified base class for all topic models, #938 #946

Merged
merged 12 commits into from
Oct 18, 2016

Conversation

markroxor
Copy link
Contributor

@markroxor markroxor commented Oct 13, 2016

Attempt to fix #938. Waiting for moderator review to proceed with further commits.

@markroxor markroxor changed the title testing base model class testing base model class,fix #938 Oct 13, 2016
Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely a step in the right direction.
Please add tests for the print_topic method and changelog.

@@ -0,0 +1,10 @@
class BaseModel():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseTopicModel is more appropriate

@tmylk
Copy link
Contributor

tmylk commented Oct 13, 2016

Please rebase off develop and add HDPModel

@markroxor
Copy link
Contributor Author

print_topic is not present in HDPModel. Should I create one like that in LDAModel and LSIModel?

@markroxor markroxor changed the title testing base model class,fix #938 Creating unified base class for all topic models, #938 Oct 13, 2016
@markroxor
Copy link
Contributor Author

I was not sure about what tests to add, please take a close look on the test files.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New testclass requested.

@@ -10,10 +10,11 @@ Changes
* Change export_phrases in Phrases model. Fix issue #794 (@AadityaJ,
[#879](https://github.com/RaRe-Technologies/gensim/pull/879))
- bigram construction can now support multiple bigrams within one sentence
* Fixed issue #838, RuntimeWarning: overflow encountered in exp (@markroxor, [#895](https://github.com/RaRe-Technologies/gensim/pull/895))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these changes?
Expect to see just 1 line added about the new fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fixing my previous hyperlinks, but alright. I'll add just one line no issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes to your prev links are ok. You can leave them in.

@@ -253,6 +255,12 @@ def testShowTopics(self):
self.assertTrue(isinstance(k, six.string_types))
self.assertTrue(isinstance(v, float))

#testing print_topic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create new test test_print_topic in new class TestBaseTopicModel.
TestLdaModel, testLSI and testHDP should inherit from TestBaseTopicModel.

Copy link
Contributor Author

@markroxor markroxor Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tests okay (the assertions of variable type)?Should I also create a unified test_basemodel.py to handle common tests in all the models like testShowTopic and testShowTopics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@tmylk
Copy link
Contributor

tmylk commented Oct 14, 2016

BTW HdpTopicFormatter and hdpmdodel has the print_topics method

@@ -193,7 +195,7 @@ def get_Elogbeta(self):
# endclass LdaState


class LdaModel(interfaces.TransformationABC):
class LdaModel(interfaces.TransformationABC,basemodel.BaseTopicModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight code style issue. Should be interfaces.TransformationABC, basemodel.BaseTopicModel.

@@ -221,7 +223,7 @@ def merge(self, other, decay=1.0):
#endclass Projection


class LsiModel(interfaces.TransformationABC):
class LsiModel(interfaces.TransformationABC,basemodel.BaseTopicModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. You should probably consider setting up a flake8 checks plugin in your editor. Will help you eliminate those nitpicks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. I will keep that in mind.

created unified TestBaseModel and aded test for printTopic
Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! See inline comments

@@ -575,9 +570,6 @@ def __init__(self, dictionary=None, topic_data=None, topic_file=None, style=None

self.style = style

def print_topics(self, num_topics=10, num_words=10):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this method removed? it is not inherited from anywhere.
It should remain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inherited from the BaseTopicModel class just like lsimodel and ldamodel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is in class HDPFormatter, not HDP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I will restore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert all changes to HdpTopicFormatter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since I've already altered the HdpTopicFormatter class. To unify the model should I remove it's print_topic method and inherit it from its parent class basemodel instead?

@@ -47,7 +47,6 @@ def testfile():
return os.path.join(tempfile.gettempdir(), 'gensim_models.tst')



class TestHdpModel(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it inherit from TestBaseTopicModel too? Feel free to add the missing methods

@@ -612,10 +604,9 @@ def show_topic_terms(self, topic_data, num_words):
def format_topic(self, topic_id, topic_terms):
if self.STYLE_GENSIM == self.style:
fmt = ' + '.join(['%.3f*%s' % (weight, word) for (word, weight) in topic_terms])
fmt = 'topic %i: %s' % (topic_id, fmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert all changes to HdpTopicFormatter

else:
fmt = '\n'.join([' %20s %.8f' % (word, weight) for (word, weight) in topic_terms])
fmt = 'topic %i:\n%s' % (topic_id, fmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for passing the testShowTopics test, all the other models throw data in this format. Should I exclude this test too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a different class

Copy link
Contributor Author

@markroxor markroxor Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about testShowTopics test? should I override this too for hdpmodel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert all changes to HdpTopicFormatter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the Travis build will fail. I have added all the other changes but cannot commit until the tests pass on my local machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what tests fail on your local machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testPrintTopic and testPrintTopics since the return type of show_topics in hdpmodel does not corresponds to that of ldamodel and lsimodel. To create a unified basemodel the return type should be same so that the unified tests can pass.

@tmylk
Copy link
Contributor

tmylk commented Oct 17, 2016

Please remove other people's commits from this PR

def testShowTopic(self):
# TODO create show_topic in HdpModel and then test

if self.__class__.__name__ == "TestHdpModel":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please override testShowTopic inside TestHdpModel and skip it there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this if statement

@@ -47,24 +48,12 @@ def testfile():
return os.path.join(tempfile.gettempdir(), 'gensim_models.tst')



class TestHdpModel(unittest.TestCase):
class TestHdpModel(unittest.TestCase, test_basemodel.TestBaseTopicModel):
def setUp(self):
self.corpus = mmcorpus.MmCorpus(datapath('testcorpus.mm'))
self.class_ = hdpmodel.HdpModel
self.model = self.class_(corpus, id2word=dictionary)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the override here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@markroxor markroxor force-pushed the singleapi branch 2 times, most recently from 32b7f58 to 096faa5 Compare October 18, 2016 05:59
@markroxor
Copy link
Contributor Author

Passed finally!!

@tmylk tmylk merged commit bf7c0ed into piskvorky:develop Oct 18, 2016
@@ -0,0 +1,16 @@
class BaseTopicModel():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Python classes must inherit from object (new-style classes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


fmt = (topic_id,fmt)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: space after comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


import six

class TestBaseTopicModel():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests should inherit from unittest.TestCase (or at least object, if you don't want this to be a real test).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want it to be a real test so adding object instead.

@tmylk
Copy link
Contributor

tmylk commented Oct 20, 2016

Hi @markroxor There is a problem with the new tests on windows. It is blocking the release. See https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/216bwdmb2gqwkme0

@markroxor
Copy link
Contributor Author

markroxor commented Oct 20, 2016

Sent PR #969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create single API/base class for all topic models
4 participants