Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShardedCorpus skips the first value of a generator #1511

Closed
karkkainenk1 opened this issue Jul 29, 2017 · 2 comments
Closed

ShardedCorpus skips the first value of a generator #1511

karkkainenk1 opened this issue Jul 29, 2017 · 2 comments
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix

Comments

@karkkainenk1
Copy link
Contributor

Description

ShardedCorpus skips the first value of a generator. This is possibly caused by ShardedCorpus not using the fixed corpus that is returned from is_corpus method, but I haven't verified this yet.

Steps/Code/Corpus to Reproduce

from gensim.corpora.sharded_corpus import ShardedCorpus

def my_generator():
    yield [(0,1)]
    yield [(1,1)]
    yield [(2,1)]

corpus = ShardedCorpus("corpus", my_generator(), dim=3, overwrite=True)

print(len(corpus))
print(corpus[0])

Expected Results

Expected output:

3
[ 1.  0.  0.]

Actual Results

Actual output:

2
[ 0.  1.  0.]

I.e. The first item in the generator has been skipped and is missing from the resulting corpus

Versions

Darwin-16.7.0-x86_64-i386-64bit
Python 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
gensim 2.3.0
FAST_VERSION 1

@piskvorky
Copy link
Owner

@karkkainenk1 thanks for reporting and for the minimal example!

Some destructive peeking, or incorrect use of is_corpus, would be my guess too. Can you look into it?

@karkkainenk1
Copy link
Contributor Author

Sure thing, sent a pull request.

@menshikh-iv menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix labels Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix
Projects
None yet
Development

No branches or pull requests

3 participants