Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Make docs clearer on `alpha` parameter in LDA model * Update Hoffman paper link * rm whitespace * Update gensim/models/ldamodel.py * Update gensim/models/ldamodel.py * Update gensim/models/ldamodel.py * re-applying changes from piskvorky#2821 * migrating + regenerating changed docs * fix forgotten iteritems * remove extra `model.wv` * split overlong doc line * get rid of six in doc2vec * increase test timeout for Visdom server * add 32/64 bits report * add deprecations for init_sims() * remove vectors_norm + add link to migration guide to deprecation warnings * rename vectors_norm everywhere, update tests, regen docs * put back no-op property setter of deprecated vectors_norm * fix typo * fix flake8 * disable Keras tests - failing with weird errors on py3.7+3.8, see https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/713448950#L862 * test showing FT failure as W2V * set .vectors even when ngrams off * Update gensim/test/test_fasttext.py * Update gensim/test/test_fasttext.py * refresh docs for run_annoy tutorial * Reduce memory use of the term similarity matrix constructor, deprecate the positive_definite parameter, and extend normalization capabilities of the inner_product method (piskvorky#2783) * Deprecate SparseTermSimilarityMatrix's positive_definite parameter * Reference paper on efficient implementation of soft cosine similarity * Add example with Annoy indexer to SparseTermSimilarityMatrix * Add example of obtaining word embeddings from SparseTermSimilarityMatrix * Reduce space complexity of SparseTermSimilarityMatrix construction Build matrix using arrays and bitfields rather than DOK sparse format This work is based on the following blog post by @maciejkula: https://maciejkula.github.io/2015/02/22/incremental-construction-of-sparse-matrices/ * Fix a typo in the soft cosine similarity Jupyter notebook * Add human-readable string representation for TermSimilarityIndex * Avoid sparse term similarity matrix computation when nonzero_limit <= 0 * Extend normalization in the inner_product method Support the `maintain` vector normalization scheme. Support separate vector normalization schemes for queries and documents. * Remove a note in the docstring of SparseTermSimilarityMatrix * Rerun continuous integration tests * Use ==/!= to compare constant literals * Add human-readable string representation for TermSimilarityIndex (cont.) * Prod flake8 with a coding style violation in a docstring * Collapse two lambdas into one internal function * Revert "Prod flake8 with a coding style violation in a docstring" This reverts commit 6557b84. * Avoid str.format() * Slice SparseTermSimilarityMatrix.inner_product tests by input types * Remove similarity_type_code local variable * Remove starting underscore from local function name * Save indentation level and define populate_buffers function * Extract SparseTermSimilarityMatrix constructor body to _create_source * Extract NON_NEGATIVE_NORM_ASSERTION_MESSAGE to a module-level constant * Extract cell assignment logic to cell_full local function * Split variable swapping into three separate statements * Extract normalization from the body of SparseTermSimilarityMatrix.inner_product * Wrap overlong line * Add test_inner_product_zerovector_zerovector and test_inner_product_zerovector_vector tests * Further split test_inner_product into 63 test cases * Raise ValueError when dictionary is empty * Fix doc2vec crash for large sets of doc-vectors (piskvorky#2907) * Fix AttributeError in WikiCorpus (piskvorky#2901) * bug fix: wikicorpus getstream from data file-path \n Replace fname with input * refactor: use property decorator for input Co-authored-by: jshah02 <jenisnehal.shah@factset.com> * intensify cbow+hs tests; bulk testing method * use increment operator Co-authored-by: Radim Řehůřek <me@radimrehurek.com> * Change num_words to topn in dtm_coherence (piskvorky#2926) * docstirng fixes * get rid of python2 constructs Co-authored-by: S Mono <10430241+xh2@users.noreply.github.com> Co-authored-by: Gordon Mohr <gojogit@gmail.com> Co-authored-by: Vít Novotný <witiko@mail.muni.cz> Co-authored-by: jeni Shah <jenishah@users.noreply.github.com> Co-authored-by: jshah02 <jenisnehal.shah@factset.com> Co-authored-by: Megan <megan.stodel@bbc.co.uk>
- Loading branch information