Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised show_topics #1028

Merged
merged 4 commits into from
Nov 27, 2016
Merged

Optimised show_topics #1028

merged 4 commits into from
Nov 27, 2016

Conversation

bhargavvader
Copy link
Contributor

Possible solution to #1006 .
@tmylk could you review?

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove duplicate code please

@@ -800,15 +800,24 @@ def show_topics(self, num_topics=10, num_words=10, log=False, formatted=True):
chosen_topics = sorted_topics[:num_topics // 2] + sorted_topics[-num_topics // 2:]

shown = []

topic = self.state.get_lambda()
for i in chosen_topics:
if formatted:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much code duplication
Just add an extra if formatted block in the end that changes formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that was quite stupid of me 💩
Will fix!

@bhargavvader
Copy link
Contributor Author

@tmylk , removed duplicate code.

@tmylk
Copy link
Contributor

tmylk commented Nov 24, 2016

Is there a performance improvement with %%timeit?

@bhargavvader
Copy link
Contributor Author

Yeah, with the old method and 5 topics for the news_corpus dataset it was 100 loops, best of 3: 6.89 ms per loop, with the optimised one it was 100 loops, best of 3: 4.28 ms per loop

@tmylk
Copy link
Contributor

tmylk commented Nov 24, 2016

hmm, that doesn't agree with the original report of

On my setup (500 topics, 100k words, learned with 500k documents), calling show_topics takes 30sec, with 28sec inside the 500 get_lambda() calls

@bhargavvader
Copy link
Contributor Author

Could it maybe be because of the number of topics? Will increase number of topics and see what the difference is like. I only have 5 topics.

@tmylk
Copy link
Contributor

tmylk commented Nov 24, 2016

Yes, let's test the reported case

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Nov 25, 2016

I got a fairly significant improvement (20 times faster) with 500 topics (vocab is smaller though, len is 44143).

For the old method:

1 loop, best of 3: 2.05 s per loop

For the optimised method:

10 loops, best of 3: 107 ms per loop

@tmylk tmylk merged commit 0b2f6b8 into piskvorky:develop Nov 27, 2016
@tmylk
Copy link
Contributor

tmylk commented Nov 27, 2016

Thanks a lot for the improvement!

@bhargavvader
Copy link
Contributor Author

😄

@bhargavvader bhargavvader deleted the LDA_fix branch February 23, 2017 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants