Replace custom epsilons with numpy equivalent in `LdaModel` #2308

horpto · 2018-12-24T07:29:47Z

…valent

piskvorky · 2018-12-24T14:14:11Z

@horpto please expand the PR description. What problem is this solving, what's the motivation for this PR?

piskvorky · 2018-12-24T14:17:56Z

gensim/models/ldamodel.py

@@ -668,6 +656,7 @@ def inference(self, chunk, collect_sstats=False):
        # Lee&Seung trick which speeds things up by an order of magnitude, compared
        # to Blei's original LDA-C code, cool!).
        integer_types = six.integer_types + (np.integer,)
+        epsilon = np.finfo(self.dtype).eps


I'm not sure this is a good idea. What are the guarantees for such epsilon?

If the epsilon is too close to the underflow edge, it might be silently ignored in some cases. I'd prefer an epsilon that is less ambiguous. I don't think we really care about getting the smallest possible number here.

In fact, do we need epsilon at all? It hints at some instability in the algorithm if it needs to be avoiding singularities in this way. Identifying when such singularities happen as soon as possible (is it a function of the input corpus? empty documents? something else?), and raising an exception, might be a preferable solution.

New Epsilon already better than we have right now (bigger, we can even use 3 * eps). I agree that this is not the best solution (reason in algorithm instability), but this is a good workaround to avoid NaN values in models (at least, this will happens less often).

LGTM for me (improve overall model stability, but not prefect solution of course), wdyt @piskvorky ?

Well, if it's an improvement we should merge it. But I'm still wary of the implications of this. Isn't it better to just raise an exception, rather than work around x / 0.0 by doing x / eps? Isn't the user screwed anyway (no exception, but nonsense results)?

Unfortunately I no longer remember why this code needs to be there :(

Isn't it better to just raise an exception, rather than work around x / 0.0 by doing x / eps?

No, because this can be raised at any moment (for example, I train model 10h and before the end, model raises an exception, in final - time already spent and no model).

Isn't the user screwed anyway (no exception, but nonsense results)

Usually not: if no NaNs in matrices, model behaves adequately.

menshikh-iv · 2019-01-09T05:51:41Z

Thanks @horpto 👍

Fix piskvorky#2115: Replace custom epsilons with automatic numpy equi…

a75062c

…valent

horpto force-pushed the I2115-nans branch from cfbc0d5 to a75062c Compare December 24, 2018 07:33

fix typo

f03dbba

piskvorky reviewed Dec 24, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/develop' into I2115-nans

afa0938

menshikh-iv changed the title ~~Fix #2115: Replace custom epsilons with automatic numpy equivalent~~ Replace custom epsilons with numpy equivalent in LdaModel Jan 9, 2019

menshikh-iv merged commit 1b07f81 into piskvorky:develop Jan 9, 2019

menshikh-iv mentioned this pull request Jan 17, 2019

Identical topics #416

Closed

horpto deleted the I2115-nans branch January 19, 2019 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace custom epsilons with numpy equivalent in `LdaModel` #2308

Replace custom epsilons with numpy equivalent in `LdaModel` #2308

horpto commented Dec 24, 2018 •

edited by menshikh-iv

Loading

piskvorky commented Dec 24, 2018

piskvorky Dec 24, 2018 •

edited

Loading

menshikh-iv Jan 8, 2019

piskvorky Jan 8, 2019 •

edited

Loading

menshikh-iv Jan 9, 2019

menshikh-iv commented Jan 9, 2019

Replace custom epsilons with numpy equivalent in LdaModel #2308

Replace custom epsilons with numpy equivalent in LdaModel #2308

Conversation

horpto commented Dec 24, 2018 • edited by menshikh-iv Loading

piskvorky commented Dec 24, 2018

piskvorky Dec 24, 2018 • edited Loading

Choose a reason for hiding this comment

menshikh-iv Jan 8, 2019

Choose a reason for hiding this comment

piskvorky Jan 8, 2019 • edited Loading

Choose a reason for hiding this comment

menshikh-iv Jan 9, 2019

Choose a reason for hiding this comment

menshikh-iv commented Jan 9, 2019

Replace custom epsilons with numpy equivalent in `LdaModel` #2308

Replace custom epsilons with numpy equivalent in `LdaModel` #2308

horpto commented Dec 24, 2018 •

edited by menshikh-iv

Loading

piskvorky Dec 24, 2018 •

edited

Loading

piskvorky Jan 8, 2019 •

edited

Loading