Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add word2vec.PathLineSentences for reading a directory as a corpus (#1364) #1423

Merged
merged 16 commits into from
Jul 18, 2017

Commits on Jun 16, 2017

  1. issue piskvorky#1364 first commit, corpus from a directory

    added method models.word2vec.LineSentencePath
    
    method to read an entire directory's files in the same style as
    models.word2vec.LineSentence
    Michael Sherman committed Jun 16, 2017
    Configuration menu
    Copy the full SHA
    44fb606 View commit details
    Browse the repository at this point in the history
  2. test for word2vec.LineSentencePath issue piskvorky#1364

    initial attempt at test, including files. test just splits the
    lee_background.cor file into two parts and puts them in a directory,
    then makes sure they match the unsplit file as loaded by
    word2vec.LineSentence
    Michael Sherman committed Jun 16, 2017
    Configuration menu
    Copy the full SHA
    0a62352 View commit details
    Browse the repository at this point in the history
  3. better handling of input for LineSentencePath

    no longer sensitive to an input without a trailing os-specific slash
    Michael Sherman committed Jun 16, 2017
    Configuration menu
    Copy the full SHA
    b55a844 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'LineSentencePath' into develop

    Michael Sherman committed Jun 16, 2017
    Configuration menu
    Copy the full SHA
    bde9cfd View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2017

  1. Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim

    …into develop
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    86517a8 View commit details
    Browse the repository at this point in the history
  2. LineSentencePath renamed PathLineSentences

    in word2vec.py . Test updated as well
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    aef2879 View commit details
    Browse the repository at this point in the history
  3. LineSentencePath rename to PathLineSentences

    in models.word2vec . Tests also updated
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    6a21b80 View commit details
    Browse the repository at this point in the history
  4. fix whitespace style error

    had only 1 space before an inline comment, flagged by travis CI build
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    f362e33 View commit details
    Browse the repository at this point in the history
  5. updated PathLineSentences test and test data

    Removed LineSentencePath directory, created PathLineSentences
    lee corpus duplicates were in LineSentencePath, was wasting space
    made new small corpus to test PathLineSentences, put in directory
    changed test to read both files manually, combine, and compare to
    PathLineSentences (rather than having a separate single file to match
    the entire contents of the PathLineSentences test_data directory
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    1dbe7b6 View commit details
    Browse the repository at this point in the history
  6. word2vec.PathLineSentences single file support

    changed PathLineSentences to support a single file in addition to a
    directory, raises a warning to use LineSentence when a single file is
    given as a parameter. added corresponding test.
    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    ac49054 View commit details
    Browse the repository at this point in the history
  7. fixing style issues

    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    bda1fe7 View commit details
    Browse the repository at this point in the history
  8. fix style issue

    Michael Sherman committed Jun 19, 2017
    Configuration menu
    Copy the full SHA
    83eb848 View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2017

  1. Configuration menu
    Copy the full SHA
    dfd1f8e View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2017

  1. Merge branch 'develop' into LineSentencePath

    resolved test_word2vec.py manually
    Michael Sherman committed Jun 23, 2017
    Configuration menu
    Copy the full SHA
    4125143 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' of https://github.com/RaRe-Technologies/gensim

    …into develop
    Michael Sherman committed Jun 23, 2017
    Configuration menu
    Copy the full SHA
    14c2265 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim

    …into develop
    Michael Sherman committed Jun 23, 2017
    Configuration menu
    Copy the full SHA
    45b92f2 View commit details
    Browse the repository at this point in the history