Allow `errors` keyword for HDF IO Encoding Err Handling #20873

WillAyd · 2018-04-30T06:21:31Z

closes "to_hdf()" with "format='table'" ignores encoder "errors" argument. #20835
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2018-04-30T06:23:22Z

pandas/io/pytables.py

@@ -705,7 +705,7 @@ def select(self, key, where=None, start=None, stop=None, columns=None,
        def func(_start, _stop, _where):
            return s.read(start=_start, stop=_stop,
                          where=_where,
-                          columns=columns, **kwargs)


I removed this **kwargs argument because it was getting mangled when calling read_index_node with arbitrary keyword arguments in read_hdf. I think it was a mistake to be included originally

codecov · 2018-04-30T07:08:25Z

Codecov Report

Merging #20873 into master will increase coverage by <.01%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master   #20873      +/-   ##
==========================================
+ Coverage   91.78%   91.78%   +<.01%     
==========================================
  Files         153      153              
  Lines       49341    49319      -22     
==========================================
- Hits        45287    45267      -20     
+ Misses       4054     4052       -2

Flag	Coverage Δ
#multiple	`90.17% <89.65%> (-0.01%)`	⬇️
#single	`41.89% <93.1%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`92.43% <93.1%> (+0.01%)`	⬆️
pandas/core/indexes/datetimelike.py	`96.7% <0%> (-0.1%)`	⬇️
pandas/core/indexes/period.py	`92.61% <0%> (-0.07%)`	⬇️
pandas/core/arrays/categorical.py	`95.57% <0%> (-0.05%)`	⬇️
pandas/core/series.py	`93.99% <0%> (-0.04%)`	⬇️
pandas/core/indexes/base.py	`96.63% <0%> (-0.01%)`	⬇️
pandas/core/resample.py	`96.06% <0%> (-0.01%)`	⬇️
pandas/io/formats/latex.py	`100% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.76% <0%> (+0.03%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28edd06...9a13234. Read the comment docs.

jreback · 2018-04-30T10:08:32Z

does anythng break if you just always pass surrogatepass when encoding? any downsides to that?

WillAyd · 2018-04-30T15:23:32Z

I think the biggest downside is that it's idiomatic in Python3 to have strict encoding when dealing with files, so if we did that here we'd introduce inconsistency with codecs handling in this project and with what I'd argue to be the larger Python ecosystem.

ref https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler

jreback

lgtm. minor doc comments

jreback · 2018-05-01T10:14:44Z

pandas/io/pytables.py

@@ -4579,6 +4598,7 @@ def _unconvert_string_array(data, nan_rep=None, encoding=None):
    data : fixed length string dtyped array
    nan_rep : the storage repr of NaN, optional
    encoding : the encoding of the data, optional
+    errors : handler for encoding errors, default 'strict'


can you show options and/or point to the python ref for these

TomAugspurger

Added links to the open docs from NDFrame.to_hdf and pd.read_hdf.

WillAyd · 2018-05-01T14:14:58Z

pandas/core/generic.py

@@ -1946,6 +1946,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
            If applying compression use the fletcher32 checksum.
        dropna : bool, default False
            If true, ALL nan rows will not be written to store.
+        errors : str, default 'strict'


@TomAugspurger I know you are adding a few things for the RC so don't need to change anything here, but do we typically document things in the API like this? Wondering if we shouldn't make all of the documented features actual keyword arguments in the call signature rather than tucking them away in kwargs.

FWIW if we have errors here we'd probably want to add encoding as well

The signature should be changed from kwargs to reflect the actual signature. I thought we had an issue for it, but didn't find one. Opened #20903

Thanks! I'll take a stab at that one later

TomAugspurger · 2018-05-01T15:25:34Z

Appveyor failure is fixed in #20906

I'm going to merge that before merging this, so that the merge commit is properly tested.

TomAugspurger · 2018-05-01T17:48:44Z

Thanks @WillAyd :)

WillAyd added 6 commits April 29, 2018 21:31

Added test case

f75ca6e

Round trippable read/write with errors

97f6a54

Added index to test case

9ae2ea0

Mirrored encoding impl

cfe09d1

Updated whatsnew

3973ef7

LINT fixup

61a0c6b

WillAyd commented Apr 30, 2018

View reviewed changes

jreback added Unicode Unicode strings IO HDF5 read_hdf, HDFStore labels Apr 30, 2018

jreback added this to the 0.23.0 milestone May 1, 2018

jreback requested changes May 1, 2018

View reviewed changes

Document errors

0fe838a

TomAugspurger approved these changes May 1, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into WillAyd-tbl-arg-pass

9a13234

WillAyd commented May 1, 2018

View reviewed changes

TomAugspurger mentioned this pull request May 1, 2018

RLS: 0.23.0 #20531

Closed

71 tasks

TomAugspurger merged commit ade293d into pandas-dev:master May 1, 2018

WillAyd deleted the tbl-arg-pass branch May 1, 2018 20:06

obilodeau mentioned this pull request Sep 5, 2018

to_csv() surrogates not allowed #22610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `errors` keyword for HDF IO Encoding Err Handling #20873

Allow `errors` keyword for HDF IO Encoding Err Handling #20873

WillAyd commented Apr 30, 2018 •

edited

Loading

WillAyd Apr 30, 2018

codecov bot commented Apr 30, 2018 •

edited

Loading

jreback commented Apr 30, 2018

WillAyd commented Apr 30, 2018

jreback left a comment

jreback May 1, 2018

TomAugspurger left a comment

WillAyd May 1, 2018

TomAugspurger May 1, 2018

WillAyd May 1, 2018

TomAugspurger commented May 1, 2018

TomAugspurger commented May 1, 2018

Allow errors keyword for HDF IO Encoding Err Handling #20873

Allow errors keyword for HDF IO Encoding Err Handling #20873

Conversation

WillAyd commented Apr 30, 2018 • edited Loading

WillAyd Apr 30, 2018

Choose a reason for hiding this comment

codecov bot commented Apr 30, 2018 • edited Loading

Codecov Report

jreback commented Apr 30, 2018

WillAyd commented Apr 30, 2018

jreback left a comment

Choose a reason for hiding this comment

jreback May 1, 2018

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

WillAyd May 1, 2018

Choose a reason for hiding this comment

TomAugspurger May 1, 2018

Choose a reason for hiding this comment

WillAyd May 1, 2018

Choose a reason for hiding this comment

TomAugspurger commented May 1, 2018

TomAugspurger commented May 1, 2018

Allow `errors` keyword for HDF IO Encoding Err Handling #20873

Allow `errors` keyword for HDF IO Encoding Err Handling #20873

WillAyd commented Apr 30, 2018 •

edited

Loading

codecov bot commented Apr 30, 2018 •

edited

Loading