Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coherence and Diff logging for LDA #1381

Closed
wants to merge 17 commits into from

Conversation

parulsethi
Copy link
Contributor

@parulsethi parulsethi commented Jun 1, 2017

Added a method to log coherence during training.

@tmylk
Copy link
Contributor

tmylk commented Jun 2, 2017

Please update coherence tutorial notebook with this new feature.
Also add a separate notebook that graphs both coherence and perplexity through training simultaneously. See examples in #1243 early versions

@menshikh-iv
Copy link
Contributor

I'm worried that there are a lot of arguments in the LDA constructor and they become even bigger. We need to think about this problem.

@parulsethi parulsethi changed the title [WIP] Add coherence logging for LDA Add coherence logging for LDA Jun 6, 2017
"""
cm = gensim.models.CoherenceModel(model=model, corpus=chunk, dictionary=self.id2word, coherence='u_mass')
corpus_words = sum(cnt for document in chunk for _, cnt in document)
logger.info("%.3f coherence estimate based on a held-out corpus of %i documents with %i words", cm.get_coherence(), len(chunk), corpus_words)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculate coherence only once and save it to variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -527,6 +528,15 @@ def log_perplexity(self, chunk, total_docs=None):
(perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words))
return perwordbound

def log_coherence(self, model, chunk):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needed to pass model at this method, you already have a model as self

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@parulsethi parulsethi changed the title Add coherence logging for LDA Add coherence and Diff logging for LDA Jun 16, 2017
@parulsethi
Copy link
Contributor Author

parulsethi commented Jun 16, 2017

Updated to add diff logging (which was previously in #1399).

@@ -195,7 +197,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
alpha='symmetric', eta=None, decay=0.5, offset=1.0,
eval_every=10, iterations=50, gamma_threshold=0.001,
minimum_probability=0.01, random_state=None, ns_conf={},
minimum_phi_value=0.01, per_word_topics=False):
minimum_phi_value=0.01, per_word_topics=False, coherence='u_mass',
coherence_texts=None, coherence_window_size=None, coherence_topn=10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use metrics from another PR

def update(self, corpus, chunksize=None, decay=None, offset=None,
passes=None, update_every=None, eval_every=None, iterations=None,
gamma_threshold=None, chunks_as_numpy=False):
gamma_threshold=None, chunks_as_numpy=False, coherence=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use metrics from another PR

@menshikh-iv
Copy link
Contributor

Continued in #1399

@parulsethi parulsethi deleted the ldalog branch July 13, 2017 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants