Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: #7757 Fix CSV parsing of singleton list header #17090

Merged
merged 1 commit into from
Aug 3, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ I/O
- Bug in :func:`read_csv` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`)
- Bug in :func:`read_csv` in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:`14696, :issue:`16798`).
- Bug in :func:`read_csv` when called with ``low_memory=False`` in which a CSV with at least one column > 2GB in size would incorrectly raise a ``MemoryError`` (:issue:`16798`).
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)

Expand Down
21 changes: 12 additions & 9 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -535,23 +535,26 @@ cdef class TextReader:
self.parser_start = 0
self.header = []
else:
if isinstance(header, list) and len(header):
# need to artifically skip the final line
# which is still a header line
header = list(header)
header.append(header[-1] + 1)
if isinstance(header, list):
if len(header) > 1:
# need to artifically skip the final line
# which is still a header line
header = list(header)
header.append(header[-1] + 1)
self.parser.header_end = header[-1]
self.has_mi_columns = 1
else:
self.parser.header_end = header[0]

self.parser_start = header[-1] + 1
self.parser.header_start = header[0]
self.parser.header_end = header[-1]
self.parser.header = header[0]
self.parser_start = header[-1] + 1
self.has_mi_columns = 1
self.header = header
else:
self.parser.header_start = header
self.parser.header_end = header
self.parser.header = header
self.parser_start = header + 1
self.parser.header = header
self.header = [ header ]

self.names = names
Expand Down
7 changes: 4 additions & 3 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2279,10 +2279,11 @@ def _infer_columns(self):
if self.header is not None:
header = self.header

# we have a mi columns, so read an extra line
if isinstance(header, (list, tuple, np.ndarray)):
have_mi_columns = True
header = list(header) + [header[-1] + 1]
have_mi_columns = len(header) > 1
# we have a mi columns, so read an extra line
if have_mi_columns:
header = list(header) + [header[-1] + 1]
else:
have_mi_columns = False
header = [header]
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/io/parser/header.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,3 +286,10 @@ def test_non_int_header(self):
self.read_csv(StringIO(data), sep=',', header=['a', 'b'])
with tm.assert_raises_regex(ValueError, msg):
self.read_csv(StringIO(data), sep=',', header='string_header')

def test_singleton_header(self):
# See GH #7757
data = """a,b,c\n0,1,2\n1,2,3"""
df = self.read_csv(StringIO(data), header=[0])
expected = DataFrame({"a": [0, 1], "b": [1, 2], "c": [2, 3]})
tm.assert_frame_equal(df, expected)