Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix citation of Mikolov paper #2098

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions gensim/models/phrases.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,8 +621,8 @@ def __getitem__(self, sentence):


def original_scorer(worda_count, wordb_count, bigram_count, len_vocab, min_count, corpus_word_count):
"""Calculation score, based on original `"Efficient Estimaton of Word Representations in Vector Space" by
Mikolov <https://arxiv.org/pdf/1301.3781.pdf>`_.
"""Calculation score, based on original `"Distributed Representations of Words and Phrases
and their Compositionality" by Mikolov <https://arxiv.org/pdf/1310.4546.pdf>`_.

Parameters
----------
Expand All @@ -641,7 +641,7 @@ def original_scorer(worda_count, wordb_count, bigram_count, len_vocab, min_count

Notes
-----
Formula: :math:`\\frac{(worda\_count - min\_count) * len\_vocab }{ (worda\_count * wordb\_count)}`.
Formula: :math:`\\frac{(bigram\_count - min\_count) * len\_vocab }{ (worda\_count * wordb\_count)}`.

"""
return (bigram_count - min_count) / worda_count / wordb_count * len_vocab
Expand Down
10 changes: 5 additions & 5 deletions gensim/sklearn_api/phrases.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@
class PhrasesTransformer(TransformerMixin, BaseEstimator):
"""Base Phrases module, wraps :class:`~gensim.models.phrases.Phrases`.

For more information, please have a look to `Mikolov, et. al: "Efficient Estimation of Word Representations in
Vector Space" <https://arxiv.org/pdf/1301.3781.pdf>`_ and `Gerlof Bouma: "Normalized (Pointwise) Mutual Information
in Collocation Extraction" <https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf>`_.
For more information, please have a look to `Mikolov, et. al: "Distributed Representations of Words and Phrases and
their Compositionality" <https://arxiv.org/pdf/1310.4546.pdf>`_ and `Gerlof Bouma: "Normalized (Pointwise) Mutual
Information in Collocation Extraction" <https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf>`_.

"""
def __init__(self, min_count=5, threshold=10.0, max_vocab_size=40000000,
Expand All @@ -63,8 +63,8 @@ def __init__(self, min_count=5, threshold=10.0, max_vocab_size=40000000,
or with a function with the expected parameter names. Two built-in scoring functions are available
by setting `scoring` to a string:

* 'default': Explained in `Mikolov, et. al: "Efficient Estimation of Word Representations
in Vector Space" <https://arxiv.org/pdf/1301.3781.pdf>`_.
* 'default': Explained in `Mikolov, et. al: "Distributed Representations of Words and Phrases
and their Compositionality" <https://arxiv.org/pdf/1310.4546.pdf>`_.
* 'npmi': Explained in `Gerlof Bouma: "Normalized (Pointwise) Mutual Information in Collocation
Extraction" <https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf>`_.

Expand Down