-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv dumps core with python 2.7.10 and pandas 0.17.1 #11716
Comments
pls show the exact code you are using |
My code is in https://github.com/jdfekete/progressivis file: The method is the following, see the last line for the call, and all the checks before. Running it with pandas 0.16.2 works without dumping core. It might be due to the GIL or lack thereof since this code is run in a second thread.
|
pls just show a short reproducible example |
This is almost certainly a problem with thread-safeness in how you are calling it. A reproducible example would help. Pls reopen when you post that. |
xref #11786 |
I am also seeing this error, intermittently, during read_csv. It's not even a particularly large file:
where the text is the contents of the file http://jonathanstray.com/papers/titanic.csv I'm not explicitly using threads in my app, though I am on Django channels. |
you should try a more modern version of pandas., lots of things have been fixed since 0.17.1 |
Indeed I am on 0.17.1. FWIW that's the version that shipped with Anaconda, though now I can't recall when I installed it. |
conda update pandas works wonders |
I am reading a very large csv file (the NYC taxi dataset at https://storage.googleapis.com/tlc-trip-data/2015/), only two columns:
index_col=False,skipinitialspace=True,usecols=['pickup_longitude', 'pickup_latitude'], chunksize=...
I load it progressively by varying-size chunks, and use 2 threads to do the progressive loading.
After reading about 10M lines (the number varies from one run to the other), it dumps a core.
Here is what GDB finds-out:
Fatal Python error: GC object already tracked
Fatal Python error: GC object already tracked
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffddd98700 (LWP 10284)]
0x00007ffff782dcc9 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 0x00007ffff782dcc9 in __GI_raise (sig=sig@entry=6)
#1 0x00007ffff78310d8 in __GI_abort () at abort.c:89
#2 0x000000000045a4f2 in Py_FatalError ()
#3 0x000000000052b5ec in PyTuple_New ()
#4 0x000000000050c73d in ?? ()
#5 0x000000000050d3f6 in Py_BuildValue ()
#6 0x00007fffec3d01d8 in buffer_rd_bytes (source=0x7fffd8006650,
#7 0x00007fffec3cf065 in parser_buffer_bytes (nbytes=,
#8 _tokenize_helper (self=0x7fffd8003480, nrows=nrows@entry=3186,
#9 0x00007fffec3cf3e7 in tokenize_nrows (self=,
#10 0x00007fffec39a3c4 in __pyx_f_6pandas_6parser_10TextReader__tokenize_rows (
#11 0x00007fffec3a21a2 in __pyx_f_6pandas_6parser_10TextReader__read_rows (
#12 0x00007fffec393f0c in __pyx_f_6pandas_6parser_10TextReader__read_low_memory
The text was updated successfully, but these errors were encountered: