-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add coherence and Diff logging for LDA #1381
Conversation
Please update coherence tutorial notebook with this new feature. |
I'm worried that there are a lot of arguments in the LDA constructor and they become even bigger. We need to think about this problem. |
gensim/models/ldamodel.py
Outdated
""" | ||
cm = gensim.models.CoherenceModel(model=model, corpus=chunk, dictionary=self.id2word, coherence='u_mass') | ||
corpus_words = sum(cnt for document in chunk for _, cnt in document) | ||
logger.info("%.3f coherence estimate based on a held-out corpus of %i documents with %i words", cm.get_coherence(), len(chunk), corpus_words) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculate coherence only once and save it to variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gensim/models/ldamodel.py
Outdated
@@ -527,6 +528,15 @@ def log_perplexity(self, chunk, total_docs=None): | |||
(perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words)) | |||
return perwordbound | |||
|
|||
def log_coherence(self, model, chunk): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No needed to pass model at this method, you already have a model as self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Updated to add diff logging (which was previously in #1399). |
@@ -195,7 +197,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, | |||
alpha='symmetric', eta=None, decay=0.5, offset=1.0, | |||
eval_every=10, iterations=50, gamma_threshold=0.001, | |||
minimum_probability=0.01, random_state=None, ns_conf={}, | |||
minimum_phi_value=0.01, per_word_topics=False): | |||
minimum_phi_value=0.01, per_word_topics=False, coherence='u_mass', | |||
coherence_texts=None, coherence_window_size=None, coherence_topn=10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use metrics
from another PR
def update(self, corpus, chunksize=None, decay=None, offset=None, | ||
passes=None, update_every=None, eval_every=None, iterations=None, | ||
gamma_threshold=None, chunks_as_numpy=False): | ||
gamma_threshold=None, chunks_as_numpy=False, coherence=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use metrics
from another PR
Continued in #1399 |
Added a method to log coherence during training.