Add cython version for "hot" functions from `gensim.models.LdaModel` #1767

arlenk · 2017-12-07T01:40:49Z

Attempt at moving some of the hotspots for ldamodel into a cython module. From some profiling, most of the time in ldamodel.LdaModel is spent in the following few spots:

dirichlet_expectation (specifically the call to psi())
calculating the mean absolute difference to determine when gamma has converged
logsumexp (even with the inline version)

I broke these functions out into a cython module. For now, I just created a fastldamodel.py (a copy of ldamodel.py but with the cython functions above) module to see if there is any interest in pursuing this. Based on some profiling on my pc (attached), the speed up is about 100% (meaning the new code takes half the time -- though this is dependent on number of topics, number of passes, etc.).

I also noticed that the current ldamodel code uses float32 datatypes, but the user can change this if desired. If we go down the cython route, we'd have to decide if we want to allow the user to change the datatype as this would require different cython functions for each data type (I believe). Alternatively, we could use the cython code when the datatype is float32 (and perhaps float64) and fall back to the slower versions otherwise.

arlenk · 2017-12-13T04:47:14Z

fyi, I reworked this change to mirror the implementation of word2vec_inner and doc2vec_inner.

The cython functions now live in ldamodel_inner.pyx and LdaModel tries to use the fast cython versions if possible, but falls back to the "slower" alternatives if it cant. I also explicitly added separate cython functions for float32 and float64, so the faster code should work with both of these dtypes now (I don't know how to handle float16s nicely in the cython code so for now that falls back to the slower code as well).

I used cython's fused datatypes to make the cython code a little nicer, but the code still has a lot of duplication in it. I'm not familiar with other templating methods in cython, but I'm sure there must be a way to make the functions generic without requiring so much duplication.

The _attach_inner_methods() function in LdaModel tries to pick up the right methods to use based on the availability of ldamodel_inner and the dtype. There's probably a nicer way of doing this as well.

I haven't checked all the edge cases yet, but my simple tests so far seem to work.

This implementation has the benefit of allowing the multicore and distributed versions to use the faster cython code as well.

I also left in the older "fastldamodel.py" files just to make comparisions between the two implementations easier.

menshikh-iv

Great work @arlenk!

Can you show benchmark results please (LdaModel/LdaMulticore with numpy/scipy stuff VS LdaModel/LdaMulticore with cythonized functions)

Also, please add tests for your cython functions (we should trust that all works correctly)

menshikh-iv · 2017-12-13T10:14:02Z

gensim/models/ldamodel.py

@@ -42,7 +42,7 @@
 from collections import defaultdict

 from gensim import interfaces, utils, matutils
-from gensim.matutils import dirichlet_expectation
+from gensim.matutils import dirichlet_expectation as dirichlet_expectation


Why import X as X ?

menshikh-iv · 2017-12-13T10:15:20Z

gensim/models/ldamodel.py

+
+try:
+    # try to load fast, cythonized code if possible
+    from gensim.models.ldamodel_inner import dirichlet_expectation_1d_f32, dirichlet_expectation_1d_f64


ping @piskvorky, strict typification improve performance, but not much, wdyt about distinct function for each type?

Up to you. I'd be mindful of the added complexity -- what sort of performance difference are we talking about?

@arlenk can you compare these variants?

will take me a few days but I will include some benchmark results.

The typed versions (_f32 and _f64) are not entirely for performance reasons though. The c code needs to be different for float vs. doubles. I used cython's fused types to remove some of the duplication, but I don't think there's a way to have python call a single function that will dispatch (statically) to the correct float vs. double cython function. We could check the dtypes at run time, but that does impose a pretty large penalty given how often these functions are called (eg, dirichlet_expectation is called millions of times even for a small corpus).

I can include a benchmark with the run-time dtype check to see how big the impact is though.

menshikh-iv · 2017-12-13T10:16:23Z

gensim/models/ldamodel.py

+try:
+    # try to load fast, cythonized code if possible
+    from gensim.models.ldamodel_inner import dirichlet_expectation_1d_f32, dirichlet_expectation_1d_f64
+    from gensim.models.ldamodel_inner import dirichlet_expectation_2d_f32, dirichlet_expectation_2d_f64


You can import all this stuff with one import with line breaks (no needed to repeat from X import several times)

menshikh-iv · 2017-12-13T10:17:44Z

gensim/models/ldamodel.py

+    """
+    return np.mean(np.abs(a - b))
+
+try:


It's better to move this section to head of file (after import section)

not sure how to restructure this yet. The except clause needs to happen after the "slow" versions of logsumexp and mean_absolute_difference are defined, so that's why the code is where it is right now. I initially included the definition of the slow versions in the except block as well (mirroring what word2vec does), but that breaks if anyone uses the float16 dtype which requires the slow functions even if the cython versions exist (since the cython versions can only handle float32 and float 64).

might make things easier just to move the "inner" functions (logsumexp, dirichlet_expectation, and mean_abs_diff) into their own python module and have that module import the cython versions if available.

Open to any other suggestions as well.

Ok, let's leave it for now as is

menshikh-iv · 2017-12-13T11:08:39Z

gensim/models/ldamodel.py

+    mean_absolute_difference_f32 = mean_absolute_difference_f64 = mean_absolute_difference
+
+
+def _attach_inner_methods(self, dtype):


Probably better to define needed functions on module level (not "patch" LdaModel at runtime), this allows avoiding a problem with backward compatibility.

that was my original intention (and that's what the original "fastldamodel.py" code did), but I am not sure how to make this work nicely with LdaModel since a user could create multiple models with different dtypes. Eg,

model0 = LdaModel(corpus=corpus, id2word=corpus.id2word, num_topics=10, dtype=np.float16)
model1 = LdaModel(corpus=corpus, id2word=corpus.id2word, num_topics=10, dtype=np.float32)
model2 = LdaModel(corpus=corpus, id2word=corpus.id2word, num_topics=10, dtype=np.float64)

each of those objects would need to use different implementations of the inner functions (unless we dispatched at run-time based on dtypes, but as mentioned before, i think that will have a decent impact on speed).

Agree, you depend on dtype of LDA

Aha, you are right, dtype is a really fresh feature (for LDA), I forgot about it.
Ok, probably that's variant is OK (another variant - add this as method to LDA and call in init)

I believe we should decide what dtype makes the most sense, and stick with it (drop support for all other dtypes).

It's not a parameter we can expose and ask users to choose for themselves (especially since we give no guidance, no pros/cons/recommendations about the implications of this choice).

@piskvorky probably float32 should be the best choice (by speed, memory consumption and quality)

I agree, but would like to see some numeric arguments (benchmarks, model quality & performance evals) to back that up.

Unless the other dtypes offer some dramatically different tradeoffs, I see no reason to complicate the code and support them.

removing the other dtypes would certainly make this change easier.

I was running some tests with the different dtypes, and float16 (one of the allowed options in the existing code) seems to over/underflow with some reasonable input data. For example the following code (I left out the imports) will crash with an overflow error:

np.seterr(all='raise')

text = api.load('text8')
id2word = gensim.corpora.Dictionary(text)
corpus = [ id2word.doc2bow(t) for t in text ]

LdaModel(corpus=corpus, id2word=id2word, num_topics=10, dtype=np.float16)

menshikh-iv · 2017-12-13T11:13:18Z

gensim/models/_fastldamodel.pyx

@@ -0,0 +1,256 @@
+from __future__ import division


This file is needed right now (after last changes where you move all "slowest" function to cython)?

Also, .pyd file (windows compiled) should not be here (only pyx and c)

no, not needed. Left it in just for comparison, but removed it now.

menshikh-iv · 2017-12-13T11:18:30Z

setup.py

@@ -13,7 +13,7 @@
 import os
 import sys
 import warnings
-
+import numpy as np


That's really wired, please don't use numpy in setup.py (by this reason all CI broken)

ok removed this. Didn't realize that you could include the numpy header files for the cython code without importing numpy directly.

arlenk · 2017-12-18T03:48:15Z

Can you show benchmark results please (LdaModel/LdaMulticore with numpy/scipy stuff VS LdaModel/LdaMulticore with cythonized functions)

Here are the results for a couple different settings (only using LdaModel for now, not LdaMulticore):
https://gist.github.com/arlenk/ea745c1d05e323d87df96c0e6b490d05

The cython code takes around 60% of the time as the current code.

Here's the code used for the benchmarks:
https://gist.github.com/arlenk/ec02cce1fe571b4c05ef3853c49289a3

menshikh-iv · 2017-12-18T06:51:06Z

@arlenk your results look great! thanks for benchmarking (but please post it as a table in this PR for history).

Can you make the same benchmark for the different dtype case?

Also, don't forget about tests (this is missing here now).

menshikh-iv · 2018-01-08T10:18:57Z

@arlenk big thanks for the concrete example 👍 , I reproduced this problem with python2 on Linux.
@piskvorky apparently we should use float64 here (by "underflow" of float32)

Stacktrace

FloatingPointError                        Traceback (most recent call last)
<ipython-input-1-e77d33ef3be3> in <module>()
     12 corpus = [ id2word.doc2bow(t) for t in text ]
     13 
---> 14 LdaModel(corpus=corpus, id2word=id2word, num_topics=100, dtype=np.float32)

/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/ldamodel.pyc in __init__(self, corpus, num_topics, id2word, distributed, chunksize, passes, update_every, alpha, eta, decay, offset, eval_every, iterations, gamma_threshold, minimum_probability, random_state, ns_conf, minimum_phi_value, per_word_topics, callbacks, dtype)
    396         if corpus is not None:
    397             use_numpy = self.dispatcher is not None
--> 398             self.update(corpus, chunks_as_numpy=use_numpy)
    399 
    400     def init_dir_prior(self, prior, name):

/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/ldamodel.pyc in update(self, corpus, chunksize, decay, offset, passes, update_every, eval_every, iterations, gamma_threshold, chunks_as_numpy)
    730 
    731                 if eval_every and ((reallen == lencorpus) or ((chunk_no + 1) % (eval_every * self.numworkers) == 0)):
--> 732                     self.log_perplexity(chunk, total_docs=lencorpus)
    733 
    734                 if self.dispatcher:

/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/ldamodel.pyc in log_perplexity(self, chunk, total_docs)
    604         corpus_words = sum(cnt for document in chunk for _, cnt in document)
    605         subsample_ratio = 1.0 * total_docs / len(chunk)
--> 606         perwordbound = self.bound(chunk, subsample_ratio=subsample_ratio) / (subsample_ratio * corpus_words)
    607         logger.info(
    608             "%.3f per-word bound, %.1f perplexity estimate based on a held-out corpus of %i documents with %i words",

/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/ldamodel.pyc in bound(self, corpus, gamma, subsample_ratio)
    842                 logger.debug("bound: at document #%i", d)
    843             if gamma is None:
--> 844                 gammad, _ = self.inference([doc])
    845             else:
    846                 gammad = gamma[d]

/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/ldamodel.pyc in inference(self, chunk, collect_sstats)
    522                 gammad = self.alpha + expElogthetad * np.dot(cts / phinorm, expElogbetad.T)
    523                 Elogthetad = dirichlet_expectation(gammad)
--> 524                 expElogthetad = np.exp(Elogthetad)
    525                 phinorm = np.dot(expElogthetad, expElogbetad) + eps
    526                 # If gamma hasn't changed much, we're done.

FloatingPointError: underflow encountered in exp

menshikh-iv · 2018-01-10T16:05:09Z

@arlenk generally, we need to investigate why "underflow" happens, fix it and stay only this data type. Thanks for your great work!

Have you any ideas how to understand, why underflow problem happens? Probably, this is a bad design of the algorithm, but I have no idea how to debug this.

arlenk · 2018-01-15T02:16:03Z

@arlenk generally, we need to investigate why "underflow" happens, fix it and stay only this data type. Thanks for your great work!

Have you any ideas how to understand, why underflow problem happens? Probably, this is a bad design of the algorithm, but I have no idea how to debug this.

I don't think it's an algorithm issue, but I will try digging into this some more.

In this particular case, I don't think it's necessarily an issue. The underflowed value is later used in a dot product, so the term involving the underflowed value is going to be ~zero in any case. If all the values in the dot product were underflows, that would be an issue. But I don't think having a few underflows will break the algorithm.

Having said that, it still may make sense to default to float64 to avoid underflows (I have not run into any yet with float64 dtype). Float32 does save memory, of course, but that could still be left as an option for anyone who is really running low on memory.

menshikh-iv · 2018-01-15T05:53:13Z

Thank you @arlenk!
@piskvorky tell us where to "dig" further, we have no ideas.

menshikh-iv

Great job @arlenk 🔥

menshikh-iv · 2018-02-21T16:47:14Z

setup.py

@@ -258,7 +257,10 @@ def finalize_options(self):
            include_dirs=[model_dir]),
        Extension('gensim.models.fasttext_inner',
            sources=['./gensim/models/fasttext_inner.c'],
-            include_dirs=[model_dir])
+            include_dirs=[model_dir]),
+        Extension('gensim.models.ldamodel_inner',


Don't forget to modify MANIFEST.ini in root of repo (add your files)

menshikh-iv · 2018-02-21T16:49:42Z

setup.py

-            include_dirs=[model_dir])
+            include_dirs=[model_dir]),
+        Extension('gensim.models.ldamodel_inner',
+            sources=['./gensim/models/ldamodel_inner.c'],


Maybe better name for it is _matutils (because this isn't model at all, this only concrete math functions)?

Also, move this file on level of matutils.py ? wdyt?

menshikh-iv · 2018-02-21T16:51:08Z

gensim/models/ldamodel.py

@@ -56,32 +55,26 @@
    np.float64: 1e-100,
 }

+try:


we can move this to matutils too (try/exc section)

menshikh-iv · 2018-02-21T16:53:14Z

@arlenk please resolve merge conflict & make suggested fixes.

I think we'll fix underflow problem not soon, what's prevents us now to merge this change?

arlenk · 2018-02-23T04:31:25Z

@arlenk please resolve merge conflict & make suggested fixes.

I think we'll fix underflow problem not soon, what's prevents us now to merge this change?

Ok, I made the suggested changes.

I also removed the use of dirichlet_expectation_1d and _2d and instead just added a cython version of dirichlet_expectation() the calls the _1d or _2d based on the input shape. Turns out, after some profiling, the added ndim check add less than a few percent of overhead and having a single dirichlet_expectation function simplifies the code.

I think this also means that hdpmodel and atmodel should benefit from the changes as well (as they call matutils.dirichlet_expectation)

menshikh-iv · 2018-02-23T05:21:59Z

@arlenk can you fix tests please (you forgot to change tests accoding to remove _1d, _2d change).

menshikh-iv · 2018-02-23T06:00:28Z

@arlenk UPD, I'll do it myself, don't worry (anyway I need to fix docstrings)

menshikh-iv · 2018-02-23T08:43:04Z

@arlenk beautiful! You made a very dotty and important change, which seriously affects the performance of Lda (and related models, that used LDA as the base model). Both of your PRs (MmReader and this one) will be our main features in next release 💥

Do not want to continue gensim cythonization? We would be very happy if you have time and wish for it ❤️.

piskvorky · 2018-02-23T10:30:02Z

Awesome job! Thanks so much @arlenk .

arlenk · 2018-02-26T01:36:11Z

@arlenk beautiful! You made a very dotty and important change, which seriously affects the performance of Lda (and related models, that used LDA as the base model). Both of your PRs (MmReader and this one) will be our main features in next release 💥

Do not want to continue gensim cythonization? We would be very happy if you have time and wish for it ❤️.

good to hear. LDA model is what I am most familiar with, but if there are other bottlenecks that have come up on the past, let me know and I am happy to take a look.

menshikh-iv · 2018-02-26T05:27:00Z

@arlenk I have several optimization tasks

(simple) New binary corpus format #1697 new binary corpus. Idea - make one more corpus in simple binary format. This should be fast to read/write (reading speed is more important here) and compact (as the file).
(hard) Multi-stream API, this is really big, but bottleneck for many models (especially for *2vec), look at related issues (that mentioned in description) for understanding, what's happens here

The last task need a really long time, but first looks simple and does not require so much time

arlenk added 7 commits December 3, 2017 22:32

cython functions for lda model

67dfde9

fix setup.py

f839755

fast, partially cythonized version of LDA model

43de438

rename module and add to top-level setup.py

f528479

remove ill-fated mean_absolute_difference_threshold

bc33289

fix import statements for new cython module location

7063fc9

move dirichlet_expectation to cython module

9a7ac32

arlenk mentioned this pull request Dec 7, 2017

inline a barebones version of logsumexp for improved performance #1745

Merged

arlenk changed the title ~~partially cython-ized version of ldamodel~~ [WIP] partially cython-ized version of ldamodel Dec 7, 2017

arlenk added 7 commits December 10, 2017 21:36

fast, partially cythonized version of LDA model

2193fd5

add _fastldamodel.pyx to setup.py

8e5f844

remove duplicate entry for _fastldamodel.pyx

e6647bc

move fastldamodel code into ldamodel

84d72cb

make sure to call cythonized functions in ldamodel.py

c3ef1a4

cython files

932a6b5

always define slow versions of logsumexp and mean_absolute_difference

a4f4177

arlenk closed this Dec 13, 2017

arlenk reopened this Dec 13, 2017

Merge branch 'develop' into cython

d3b03f9

menshikh-iv suggested changes Dec 13, 2017

View reviewed changes

menshikh-iv reviewed Dec 13, 2017

View reviewed changes

arlenk added 6 commits December 13, 2017 21:14

remove unecessary numpy import in setup.py

67f8829

move changes from fastldamodel into ldamodel

37f0b7e

clean up import statement

03bfbb8

remove spaces from cython c files

81b415b

Merge branch 'cython' of github.com:arlenk/gensim into cython

e189aca

remove old fastldamodel.c file from setup.py

f576bc7

arlenk added 2 commits January 3, 2018 21:15

fix flake8 warnings

e7c108b

flake8 fixes

545eac5

Merge remote-tracking branch 'upstream/develop' into cython

29d8ea8

menshikh-iv mentioned this pull request Jan 19, 2018

Lda Model does not work with numpy 1.13 #1846

Closed

menshikh-iv suggested changes Feb 21, 2018

View reviewed changes

arlenk added 2 commits February 22, 2018 22:01

fix extension definition for ladmodel_inner

ce0e1e7

move ldamodel_inner to _matutils

e5ce2e1

arlenk added 2 commits February 22, 2018 23:39

move ldamodel_inner to _matutils

52ee195

Merge branch 'cython' of https://github.com/arlenk/gensim into cython

4edfc23

menshikh-iv mentioned this pull request Feb 23, 2018

LDA underflow problem #1927

Open

menshikh-iv changed the title ~~[WIP] partially cython-ized version of ldamodel~~ Add cython version for "hot" functions from gensim.models.LdaModel Feb 23, 2018

menshikh-iv added 5 commits February 23, 2018 11:33

Add missing rst for doc & rebuild cython with needed parameter

a4f103c

fix documentation for _matutils

ac18286

fix documentation for matutils & _matutils

05d5ad3

fix nitpicks

7b42671

fix test_matutils

d4387dd

menshikh-iv merged commit a54e336 into piskvorky:develop Feb 23, 2018

arlenk deleted the cython branch February 26, 2018 01:34

		mean_absolute_difference_f32 = mean_absolute_difference_f64 = mean_absolute_difference


		def _attach_inner_methods(self, dtype):

Add cython version for "hot" functions from gensim.models.LdaModel #1767

Add cython version for "hot" functions from gensim.models.LdaModel #1767

Conversation

arlenk commented Dec 7, 2017

arlenk commented Dec 13, 2017

menshikh-iv left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arlenk Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

menshikh-iv Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv Dec 13, 2017 • edited Loading

Choose a reason for hiding this comment

arlenk Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arlenk commented Dec 18, 2017 • edited Loading

menshikh-iv commented Dec 18, 2017 • edited Loading

menshikh-iv commented Jan 8, 2018 • edited Loading

menshikh-iv commented Jan 10, 2018

arlenk commented Jan 15, 2018

menshikh-iv commented Jan 15, 2018

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 21, 2018

arlenk commented Feb 23, 2018

menshikh-iv commented Feb 23, 2018

menshikh-iv commented Feb 23, 2018

menshikh-iv commented Feb 23, 2018

piskvorky commented Feb 23, 2018

arlenk commented Feb 26, 2018

menshikh-iv commented Feb 26, 2018

Add cython version for "hot" functions from `gensim.models.LdaModel` #1767

Add cython version for "hot" functions from `gensim.models.LdaModel` #1767

menshikh-iv left a comment •

edited

Loading

arlenk Dec 14, 2017 •

edited

Loading

piskvorky Dec 14, 2017 •

edited

Loading

menshikh-iv Dec 14, 2017 •

edited

Loading

piskvorky Dec 14, 2017 •

edited

Loading

menshikh-iv Dec 13, 2017 •

edited

Loading

arlenk Dec 14, 2017 •

edited

Loading

arlenk commented Dec 18, 2017 •

edited

Loading

menshikh-iv commented Dec 18, 2017 •

edited

Loading

menshikh-iv commented Jan 8, 2018 •

edited

Loading