Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: Python int too large to convert to C long #95

Closed
valentina-bec opened this issue Feb 27, 2021 · 5 comments
Closed

OverflowError: Python int too large to convert to C long #95

valentina-bec opened this issue Feb 27, 2021 · 5 comments

Comments

@valentina-bec
Copy link

valentina-bec commented Feb 27, 2021

when importing the model:

from aitextgen import aitextgen

i got the error:

csv.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long

solved by changing
csv.field_size_limit(sys.maxsize)
in TokenDataset.py

by :

maxInt = sys.maxsize
try:
    csv.field_size_limit(maxInt)
except OverflowError:
    maxInt = int(maxInt/10)

Does this change affect the model?

@MarcusLlewellyn
Copy link

So, I'm out of my depth here. But this looks like it is expecting a C long, but sys.maxsize returns the value for a C long long or 9223372036854775807 on my system (Python 3.8.7 64-bit on Windows 10). I think that the maximum value in this case might actually need to be 2^32 - 1 or 2,147,483,647.

@minimaxir
Copy link
Owner

This was added with #67

That change will not affect the model/dataset construction at all.

What OS are you using? I see 9223372036854775807 as well. I have no issues explicitly setting 2**32 - 1.

@MarcusLlewellyn
Copy link

MarcusLlewellyn commented Feb 27, 2021

@minimaxir Windows 10 build 19042 64-bit with Python 3.8.7 64-bit.

Edit: And it still fails when I plug 2**32 -1 in as a magic number. :(
Edit2: 2**16 -1 works. I get a whole new error, but that's probably for another issue.

@smt923
Copy link

smt923 commented Mar 2, 2021

I have this issue also, Windows 10 20H2 (19042.804), using Python 3.9.2, the change in TokenDataset.py also fixed it for me and allows me to import and generate

for what it's worth I also see 9223372036854775807 for sys.maxsize, and just out of curiosity I tried it on WSL2 Ubuntu and it does work fine there without any changes

@minimaxir
Copy link
Owner

So this is apparently a complicated quirk. From that, it seems 2 ** 31 - 1 is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants