Skip to content

Commit

Permalink
BUG: add pyarrow autogenerated prefix (pandas-dev#55115)
Browse files Browse the repository at this point in the history
* add pyarrow autogenerated prefix

* whats new bug fix

* test with no head and pyarrow

* only test pyarrow

* BUG: This fixes pandas-dev#55009 (`raw=True` caused `apply` method of `DataFrame` to ignore passed arguments) (pandas-dev#55089)

* fixes pandas-dev#55009

* update documentation

* write documentation

* add test

* change formatting

* cite DataDrame directly in docs

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* PR review feedback

* Update doc/source/whatsnew/v2.2.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* alphabetical whatsnew

---------

Co-authored-by: Martin Šícho <sichom@vscht.cz>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
4 people authored Sep 27, 2023
1 parent 61d2056 commit 824a273
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ MultiIndex
I/O
^^^
- Bug in :func:`read_csv` where ``on_bad_lines="warn"`` would write to ``stderr`` instead of raise a Python warning. This now yields a :class:`.errors.ParserWarning` (:issue:`54296`)
- Bug in :func:`read_csv` with ``engine="pyarrow"`` where ``usecols`` wasn't working with a csv with no headers (:issue:`54459`)
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
- Bug in :func:`to_excel`, with ``OdsWriter`` (``ods`` files) writing boolean/string value (:issue:`54994`)

Expand Down
6 changes: 6 additions & 0 deletions pandas/io/parsers/arrow_parser_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,12 @@ def handle_warning(invalid_row):
)
}
self.convert_options["strings_can_be_null"] = "" in self.kwds["null_values"]
# autogenerated column names are prefixed with 'f' in pyarrow.csv
if self.header is None and "include_columns" in self.convert_options:
self.convert_options["include_columns"] = [
f"f{n}" for n in self.convert_options["include_columns"]
]

self.read_options = {
"autogenerate_column_names": self.header is None,
"skip_rows": self.header
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/io/parser/test_header.py
Original file line number Diff line number Diff line change
Expand Up @@ -684,3 +684,21 @@ def test_header_delim_whitespace(all_parsers):
result = parser.read_csv(StringIO(data), delim_whitespace=True)
expected = DataFrame({"a,b": ["1,2", "3,4"]})
tm.assert_frame_equal(result, expected)


def test_usecols_no_header_pyarrow(pyarrow_parser_only):
parser = pyarrow_parser_only
data = """
a,i,x
b,j,y
"""
result = parser.read_csv(
StringIO(data),
header=None,
usecols=[0, 1],
dtype="string[pyarrow]",
dtype_backend="pyarrow",
engine="pyarrow",
)
expected = DataFrame([["a", "i"], ["b", "j"]], dtype="string[pyarrow]")
tm.assert_frame_equal(result, expected)

0 comments on commit 824a273

Please sign in to comment.