BM25 scoring function updated, Fixes #1828 #1830

sj29-innovate · 2018-01-08T07:59:02Z

len(document) has been changed to len(corpus[index]) so that it takes length of the index document.

* fix 1771 * fix import

* Added docstrings in textcleaner.py * Added docstrings to bm25.py * syntactic_unit.py docstrings and typo * added doctrings for graph modules * keywords draft * keywords draft updated * keywords draft updated again * keywords edited * pagerank started * pagerank summarizer docstring added * fixed types in docstrings in commons, bm25, graph and keywords * fixed types, examples and types in docstrings * fix pep8 * fix doc build * fix bm25 * fix graph * fix graph[2] * fix commons * fix keywords * fix keywords[2] * fix mz_entropy * fix pagerank_weighted * fix graph rst * fix summarizer * fix syntactic_unit * fix textcleaner * fix

* Updates Poincare eval notebook with regularized model results * Moves all evaluation details to Poincare evaluation notebook, cleans up tutorial notebook * Adds relevant links to Poincare tutorial * Adds dependency installation to Poincare eval notebook * Updates html structure of result table in poincare eval notebook

* Add model to dict method * add documentation and oneliner code * Add benchmark

* update contributing.md * fix language * Add info about enviroment * add links to CONTRIBUTING guide * Add linux/win split * add path where user can found documentation

It was erroneously stated that when sg=1, CBOW is used, otherwise skip-gram is used. In fact, it is vice versa (quite logically, as sg=SkipGram). Thus, the description should be fixed.

* word embedding visualization * show viz * disable logging * minor fixs

) * add doc for gensim.similarity.index * change default notation * docstrings for docsim[1] * add into for gensim.similarities.index * docstrings for docsim[2] * docstrings for docsim[3] * fix annoy part * revert docsim * fix PEP8

* Adds wordnet mammal train file * Adds link to data file in notebook

* update according to new pytest_benchmark version * update wheel-storage url * use only twine

* Add docstrings in numpy-style fromat * fix PEP8 * remove outdated "hack" (smart_open is core dependency right now) * fix docstrings[1] * remove unused internal class * fix docstrings[2] * fix docstrings[3] * fix docstrings[4] * fix docstrings[5] * fix docstrings[6] * fix docstrings[7] * fix docstrings[8] * add missing `pattern` to doc dependencies * fix docstrings[9] * fix docstrings[10]

* first attempt to convert few lines into numpy-style doc * added parameters in documentation * more documentation * few corrections * show inheritance and undoc members * show special members * example is executable now * link to the paper added, named parameters * fixed doc * fixed doc * fixed whitespaces * fix docstrings & PEP8 * fix docstrings * fix typo

* convert Space class doc to numpy style * fix docstrings[1] * fix docstrings[2] * remove useless load * fix docstrings[3] * add missing import * fix docstrings[4]

menshikh-iv and others added 15 commits December 9, 2017 19:38

Merge branch 'master' into develop

11c44d2

Fix import in get_my_ip. Fix #1771 (#1772)

056ec00

* fix 1771 * fix import

Add model_to_dict one-liner to word2vec notebook. Fix #1269 (#1776)

bf1b865

* Add model to dict method * add documentation and oneliner code * Add benchmark

Update contributing guide. Fix #1786 (#1793)

018d40a

* update contributing.md * fix language * Add info about enviroment * add links to CONTRIBUTING guide * Add linux/win split * add path where user can found documentation

Fix typo in doc2vec-IMDB. Fix #1788 (#1796)

2f2d4f5

Fix description of sg parameter for gensim.models.FastText (#1801)

7a688d0

It was erroneously stated that when sg=1, CBOW is used, otherwise skip-gram is used. In fact, it is vice versa (quite logically, as sg=SkipGram). Thus, the description should be fixed.

Add word embedding viz to word2vec notebook. Fix #1419 (#1800)

e28144a

* word embedding visualization * show viz * disable logging * minor fixs

Add wordnet mammal train file for Poincare notebook (#1781)

a76915c

* Adds wordnet mammal train file * Adds link to data file in notebook

Fix tox.ini/setup.cfg configuration (#1815)

7fa8a9f

* update according to new pytest_benchmark version * update wheel-storage url * use only twine

Fix docstrings for gensim.models.translation_matrix (#1806)

37bc8d4

* convert Space class doc to numpy style * fix docstrings[1] * fix docstrings[2] * remove useless load * fix docstrings[3] * add missing import * fix docstrings[4]

sj29-innovate changed the title ~~BM25 scoring function updated~~ BM25 scoring function updated, Fixes #1828 Jan 8, 2018

sj29-innovate closed this Jan 8, 2018

sj29-innovate reopened this Jan 8, 2018

sj29-innovate closed this Jan 8, 2018

menshikh-iv mentioned this pull request Jan 8, 2018

Fix formula in gensim.summarization.bm25. Fix #1828 #1833

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BM25 scoring function updated, Fixes #1828 #1830

BM25 scoring function updated, Fixes #1828 #1830

sj29-innovate commented Jan 8, 2018

BM25 scoring function updated, Fixes #1828 #1830

BM25 scoring function updated, Fixes #1828 #1830

Conversation

sj29-innovate commented Jan 8, 2018