Add coherence and Diff logging for LDA #1381

parulsethi · 2017-06-01T16:54:51Z

Added a method to log coherence during training.

tmylk · 2017-06-02T12:33:58Z

Please update coherence tutorial notebook with this new feature.
Also add a separate notebook that graphs both coherence and perplexity through training simultaneously. See examples in #1243 early versions

menshikh-iv · 2017-06-04T18:37:55Z

I'm worried that there are a lot of arguments in the LDA constructor and they become even bigger. We need to think about this problem.

menshikh-iv · 2017-06-08T07:08:51Z

gensim/models/ldamodel.py

+        """
+        cm = gensim.models.CoherenceModel(model=model, corpus=chunk, dictionary=self.id2word, coherence='u_mass')
+        corpus_words = sum(cnt for document in chunk for _, cnt in document)
+        logger.info("%.3f coherence estimate based on a held-out corpus of %i documents with %i words", cm.get_coherence(), len(chunk), corpus_words)


Calculate coherence only once and save it to variable

menshikh-iv · 2017-06-08T07:09:50Z

gensim/models/ldamodel.py

@@ -527,6 +528,15 @@ def log_perplexity(self, chunk, total_docs=None):
                    (perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words))
        return perwordbound

+    def log_coherence(self, model, chunk):


No needed to pass model at this method, you already have a model as self

…into ldalog

parulsethi · 2017-06-16T21:25:06Z

Updated to add diff logging (which was previously in #1399).

menshikh-iv · 2017-06-28T17:11:38Z

gensim/models/ldamodel.py

@@ -195,7 +197,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
                 alpha='symmetric', eta=None, decay=0.5, offset=1.0,
                 eval_every=10, iterations=50, gamma_threshold=0.001,
                 minimum_probability=0.01, random_state=None, ns_conf={},
-                 minimum_phi_value=0.01, per_word_topics=False):
+                 minimum_phi_value=0.01, per_word_topics=False, coherence='u_mass',
+                 coherence_texts=None, coherence_window_size=None, coherence_topn=10,


use metrics from another PR

menshikh-iv · 2017-06-28T17:13:01Z

gensim/models/ldamodel.py

    def update(self, corpus, chunksize=None, decay=None, offset=None,
               passes=None, update_every=None, eval_every=None, iterations=None,
-               gamma_threshold=None, chunks_as_numpy=False):
+               gamma_threshold=None, chunks_as_numpy=False, coherence=None,


use metrics from another PR

menshikh-iv · 2017-07-13T17:28:34Z

Continued in #1399

parulsethi added 3 commits May 31, 2017 01:14

added topic coherence logging

fc31026

make coherence measure optional

ff460a5

added comment for texts

33cc246

parulsethi added 3 commits June 5, 2017 22:53

log 'u_mass' only

c0f5c26

update coherence_tutorial with logging description

ad0d1aa

add id2word parameter

940c7b7

parulsethi changed the title ~~[WIP] Add coherence logging for LDA~~ Add coherence logging for LDA Jun 6, 2017

menshikh-iv suggested changes Jun 8, 2017

View reviewed changes

parulsethi added 7 commits June 9, 2017 17:15

made requested changes

d47d7fa

fix flake8

ee16118

make coherence measure optional

5e37b3f

fix failing tests

4dad8f4

add diff logging

950da45

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

c9bff0b

…into ldalog

make distance measure optional for diff

017a754

parulsethi changed the title ~~Add coherence logging for LDA~~ Add coherence and Diff logging for LDA Jun 16, 2017

parulsethi mentioned this pull request Jun 16, 2017

[MRG] Lda training visualization in visdom #1399

Merged

1 task

parulsethi added 4 commits June 17, 2017 03:02

fix flake8

f07d1d2

give relevant parameter names

2e3c474

Merge branch 'develop' into ldalog

96ec4c3

fix flake8

00e7762

menshikh-iv reviewed Jun 28, 2017

View reviewed changes

menshikh-iv closed this Jul 13, 2017

parulsethi deleted the ldalog branch July 13, 2017 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coherence and Diff logging for LDA #1381

Add coherence and Diff logging for LDA #1381

parulsethi commented Jun 1, 2017 •

edited

Loading

tmylk commented Jun 2, 2017

menshikh-iv commented Jun 4, 2017

menshikh-iv Jun 8, 2017

parulsethi Jun 16, 2017

menshikh-iv Jun 8, 2017

parulsethi Jun 16, 2017

parulsethi commented Jun 16, 2017 •

edited

Loading

menshikh-iv Jun 28, 2017

menshikh-iv Jun 28, 2017

menshikh-iv commented Jul 13, 2017

Add coherence and Diff logging for LDA #1381

Add coherence and Diff logging for LDA #1381

Conversation

parulsethi commented Jun 1, 2017 • edited Loading

tmylk commented Jun 2, 2017

menshikh-iv commented Jun 4, 2017

menshikh-iv Jun 8, 2017

Choose a reason for hiding this comment

parulsethi Jun 16, 2017

Choose a reason for hiding this comment

menshikh-iv Jun 8, 2017

Choose a reason for hiding this comment

parulsethi Jun 16, 2017

Choose a reason for hiding this comment

parulsethi commented Jun 16, 2017 • edited Loading

menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

menshikh-iv commented Jul 13, 2017

parulsethi commented Jun 1, 2017 •

edited

Loading

parulsethi commented Jun 16, 2017 •

edited

Loading