Measure performance of gensim 4.0.0 vs previous versions #2887

mpenkov · 2020-07-19T04:08:30Z

Not every 1-line decision; just ones that are in inner loops of hot-spot code.

Definitely a big TODO: compare performance before/after.

Originally posted by @piskvorky in https://github.com/_render_node/MDExOlB1bGxSZXF1ZXN0MzQ5Mjk1NTk1/timeline/more_items

piskvorky · 2020-07-26T15:52:48Z

This link is also broken for me – I get 400. @mpenkov this way of creating tickets seems more trouble than worth, with the context missing.

mpenkov · 2020-08-16T11:13:30Z

I'll take it up with github support. It's convenient, but only when it works.

mpenkov · 2020-08-19T10:45:03Z

@piskvorky From github support:

This might be an uncaught edge case on our end. I have raised this up with our engineering team to investigate further.

I'll keep my eyes open for the problem in case it recurs.

piskvorky · 2020-09-24T10:49:43Z

Some Word2vec measurements here: #2939 (comment)

I wonder what the original "Not every 1-line decision; just ones that are in inner loops of hot-spot code." was referring to though, the link is still broken. Probably some change of code deep in C loops.

piskvorky · 2020-10-18T00:19:42Z

Comparing current develop (at ea87470) against 3.8.3. Identical training params (all default except 12 workers + 1 epoch), identical HW, text9 corpus (124,301,826 words), measured with gensim_benchmark.py:

fasttext 3.8.3

training on a 124301826 raw words (88163974 effective words) took 107.3s, 821794 effective words/s
2:53.89 elapsed
4318564k peak RAM
stored model size: 1.9G

fasttext develop

training on a 124301826 raw words (88166519 effective words) took 96.4s, 914282 effective words/s
2:17.43 elapsed
1318592k peak RAM (!! 3x less memory)
stored model size: 939M

word2vec 3.8.3

training on a 124301826 raw words (88162276 effective words) took 52.3s, 1684982 effective words/s
1:41.36 elapsed
373612k peak RAM
stored model size: 181M

word2vec develop

training on a 124301826 raw words (88166114 effective words) took 50.0s, 1762436 effective words/s
1:13.82 elapsed (!! – faster weight init)
348060k peak RAM
stored model size: 176M

phrases 3.8.3

using 17692319 counts as vocab in Phrases<0 vocab, min_count=5, threshold=10.0, max_vocab_size=40000000>
2:23.51 elapsed
1611916k peak RAM
stored model size: 699M
186.7s apply frozen[text9]

phrases develop

merged Phrases<17692319 vocab, min_count=5, threshold=10.0, max_vocab_size=40000000>
1:50.76 elapsed
1886588k peak RAM
stored model size: 429M
81.6s apply frozen[text9]

CC @gojomo FYI. I also double-checked loading models with mmap='r' and everything seems fine.

piskvorky added this to the *2vec aftermath milestone Jul 26, 2020

piskvorky mentioned this issue Sep 13, 2020

[MRG] *2Vec SaveLoad improvements #2939

Merged

piskvorky modified the milestones: *2vec aftermath, 4.0.0 Sep 24, 2020

piskvorky self-assigned this Oct 16, 2020

piskvorky mentioned this issue Oct 18, 2020

[MRG] Benchmark models for 4.0 #2982

Merged

piskvorky closed this as completed in #2982 Oct 18, 2020

piskvorky reopened this Oct 19, 2020

piskvorky closed this as completed Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure performance of gensim 4.0.0 vs previous versions #2887

Measure performance of gensim 4.0.0 vs previous versions #2887

mpenkov commented Jul 19, 2020 •

edited by piskvorky

Loading

piskvorky commented Jul 26, 2020 •

edited

Loading

mpenkov commented Aug 16, 2020

mpenkov commented Aug 19, 2020

piskvorky commented Sep 24, 2020 •

edited

Loading

piskvorky commented Oct 18, 2020 •

edited

Loading

Measure performance of gensim 4.0.0 vs previous versions #2887

Measure performance of gensim 4.0.0 vs previous versions #2887

Comments

mpenkov commented Jul 19, 2020 • edited by piskvorky Loading

piskvorky commented Jul 26, 2020 • edited Loading

mpenkov commented Aug 16, 2020

mpenkov commented Aug 19, 2020

piskvorky commented Sep 24, 2020 • edited Loading

piskvorky commented Oct 18, 2020 • edited Loading

fasttext 3.8.3

fasttext develop

word2vec 3.8.3

word2vec develop

phrases 3.8.3

phrases develop

mpenkov commented Jul 19, 2020 •

edited by piskvorky

Loading

piskvorky commented Jul 26, 2020 •

edited

Loading

piskvorky commented Sep 24, 2020 •

edited

Loading

piskvorky commented Oct 18, 2020 •

edited

Loading