Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added malletmodel2ldamodel transformation function to mallet wrapper #766

Merged
merged 4 commits into from
Jul 6, 2016

Conversation

devashishd12
Copy link
Contributor

This is concerning this thread in the mailing list. @piskvorky @tmylk could you please check?

-------
model_gensim : LdaModel instance; copied gensim LdaModel
"""
model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, iterations=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there no way to pick up the number of iterations from the mallet model? I notice you've kept it as the default 100 right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes even I thought of doing that but I instead just followed what @piskvorky wrote in the mailing list. Both are giving similar results though. I guess I'll switch this to mallet_model.iterations for better resemblance to the mallet model.

@devashishd12 devashishd12 changed the title Added malletmode2ldamodel transformation function to mallet wrapper Added malletmodel2ldamodel transformation function to mallet wrapper Jun 30, 2016
tm2 = ldamallet.malletmodel2ldamodel(tm1)
for document in corpus:
self.assertAlmostEqual(tm1[document][0][1], tm2[document][0][1], places=1)
self.assertAlmostEqual(tm1[document][1][1], tm2[document][1][1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky I'm not too sure about this test. Values sometimes are differing at the first place too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try more accurate gamma_threshold and iterations. But I don't think the results can really be made identical -- the inference algorithms are completely different.

I think the best we can hope for is a sanity check -- topics in same order, on some (model, query_document) pair where it's clear what the topic order should be, and where any deviation in topic order is clearly an error.

@devashishd12
Copy link
Contributor Author

@piskvorky I've addressed the comments. Could you please check?

model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,
alpha=mallet_model.alpha, iterations=mallet_model.iterations)
model_gensim = LdaModel(
id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: Hanging indent of 4 spaces.

@devashishd12
Copy link
Contributor Author

@piskvorky I've made the changes. Could you please check?

@tmylk tmylk merged commit fb8e4e5 into piskvorky:develop Jul 6, 2016
@devashishd12 devashishd12 deleted the mallet_to_model branch July 7, 2016 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants