Skip to content

Commit

Permalink
BUG: read_table and read_csv crash (pandas-dev#22750)
Browse files Browse the repository at this point in the history
A missing null-pointer check made read_table and read_csv prone
to crash on badly encoded text. Add null-pointer check.

Closes pandas-devgh-22748.
  • Loading branch information
troels authored and victor committed Sep 30, 2018
1 parent f0f8f43 commit 014ec79
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -756,6 +756,7 @@ I/O

- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)
- :func:`read_csv()` will correctly parse timezone-aware datetimes (:issue:`22256`)
- :func:`read_sas()` will parse numbers in sas7bdat-files that have width less than 8 bytes correctly. (:issue:`21616`)
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
Expand Down
6 changes: 5 additions & 1 deletion pandas/_libs/src/parser/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,11 @@ void *buffer_rd_bytes(void *source, size_t nbytes, size_t *bytes_read,
return NULL;
} else if (!PyBytes_Check(result)) {
tmp = PyUnicode_AsUTF8String(result);
Py_XDECREF(result);
Py_DECREF(result);
if (tmp == NULL) {
PyGILState_Release(state);
return NULL;
}
result = tmp;
}

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/parser/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import sys
from datetime import datetime
from collections import OrderedDict
from io import TextIOWrapper

import pytest
import numpy as np
Expand Down Expand Up @@ -1609,3 +1610,11 @@ def test_skip_bad_lines(self):
val = sys.stderr.getvalue()
assert 'Skipping line 3' in val
assert 'Skipping line 5' in val

def test_buffer_rd_bytes_bad_unicode(self):
# Regression test for #22748
t = BytesIO(b"\xB0")
if PY3:
t = TextIOWrapper(t, encoding='ascii', errors='surrogateescape')
with pytest.raises(UnicodeError):
pd.read_csv(t, encoding='UTF-8')

0 comments on commit 014ec79

Please sign in to comment.