Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_table and read_csv crash (#22748) #22750

Merged
merged 2 commits into from
Sep 24, 2018

Conversation

troels
Copy link
Contributor

@troels troels commented Sep 18, 2018

A missing null-pointer check made read_table and read_csv prone
to crash on badly encoded text. Add null-pointer check.

@pep8speaks
Copy link

Hello @troels! Thanks for submitting the PR.

@troels troels force-pushed the dont-crash-on-badly-encoded-data branch from b92bc82 to 6d5ecc7 Compare September 18, 2018 23:22
@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #22750 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22750      +/-   ##
==========================================
+ Coverage   92.18%   92.18%   +<.01%     
==========================================
  Files         169      169              
  Lines       50812    50820       +8     
==========================================
+ Hits        46842    46850       +8     
  Misses       3970     3970
Flag Coverage Δ
#multiple 90.6% <ø> (ø) ⬆️
#single 42.38% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 85.82% <0%> (-0.21%) ⬇️
pandas/core/generic.py 96.67% <0%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 98.1% <0%> (+0.01%) ⬆️
pandas/core/ops.py 97.37% <0%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 945bf75...e71abb6. Read the comment docs.

@troels troels force-pushed the dont-crash-on-badly-encoded-data branch from 6d5ecc7 to b7ddc09 Compare September 19, 2018 06:28
t = TextIOWrapper(b, encoding='ascii', errors='surrogateescape')
with pytest.raises(UnicodeEncodeError):
pd.read_csv(t)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you run this test against the Python engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gfyoung

I've moved this to common.py now. I guess that even if the interpreter crash is only reproducible in the c-parser, the parsers should behave the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. Rarely do we have tests that specific to a given parser, unless the parser does not support the behavior we are testing.

@gfyoung gfyoung added Bug IO CSV read_csv, to_csv labels Sep 21, 2018
@troels troels force-pushed the dont-crash-on-badly-encoded-data branch from b7ddc09 to 55e6e43 Compare September 23, 2018 13:02
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Sep 23, 2018

tiny change. ping on green. @gfyoung merge when pushed and green.

@jreback jreback added this to the 0.24.0 milestone Sep 23, 2018
@troels troels force-pushed the dont-crash-on-badly-encoded-data branch from 55e6e43 to 93fb9d5 Compare September 23, 2018 13:13
troels and others added 2 commits September 23, 2018 15:14
A missing null-pointer check made read_table and read_csv prone
to crash on badly encoded text. Add null-pointer check.
@gfyoung gfyoung merged commit 64b88e8 into pandas-dev:master Sep 24, 2018
@gfyoung
Copy link
Member

gfyoung commented Sep 24, 2018

Thanks @troels !

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
A missing null-pointer check made read_table and read_csv prone
to crash on badly encoded text. Add null-pointer check.

Closes pandas-devgh-22748.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_table crashes Python on surrogates
4 participants