Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow errors keyword for HDF IO Encoding Err Handling #20873

Merged
merged 8 commits into from
May 1, 2018

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Apr 30, 2018

@@ -705,7 +705,7 @@ def select(self, key, where=None, start=None, stop=None, columns=None,
def func(_start, _stop, _where):
return s.read(start=_start, stop=_stop,
where=_where,
columns=columns, **kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this **kwargs argument because it was getting mangled when calling read_index_node with arbitrary keyword arguments in read_hdf. I think it was a mistake to be included originally

@codecov
Copy link

codecov bot commented Apr 30, 2018

Codecov Report

Merging #20873 into master will increase coverage by <.01%.
The diff coverage is 93.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20873      +/-   ##
==========================================
+ Coverage   91.78%   91.78%   +<.01%     
==========================================
  Files         153      153              
  Lines       49341    49319      -22     
==========================================
- Hits        45287    45267      -20     
+ Misses       4054     4052       -2
Flag Coverage Δ
#multiple 90.17% <89.65%> (-0.01%) ⬇️
#single 41.89% <93.1%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/io/pytables.py 92.43% <93.1%> (+0.01%) ⬆️
pandas/core/indexes/datetimelike.py 96.7% <0%> (-0.1%) ⬇️
pandas/core/indexes/period.py 92.61% <0%> (-0.07%) ⬇️
pandas/core/arrays/categorical.py 95.57% <0%> (-0.05%) ⬇️
pandas/core/series.py 93.99% <0%> (-0.04%) ⬇️
pandas/core/indexes/base.py 96.63% <0%> (-0.01%) ⬇️
pandas/core/resample.py 96.06% <0%> (-0.01%) ⬇️
pandas/io/formats/latex.py 100% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.76% <0%> (+0.03%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28edd06...9a13234. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2018

does anythng break if you just always pass surrogatepass when encoding? any downsides to that?

@jreback jreback added Unicode Unicode strings IO HDF5 read_hdf, HDFStore labels Apr 30, 2018
@WillAyd
Copy link
Member Author

WillAyd commented Apr 30, 2018

I think the biggest downside is that it's idiomatic in Python3 to have strict encoding when dealing with files, so if we did that here we'd introduce inconsistency with codecs handling in this project and with what I'd argue to be the larger Python ecosystem.

ref https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler

@jreback jreback added this to the 0.23.0 milestone May 1, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor doc comments

@@ -4579,6 +4598,7 @@ def _unconvert_string_array(data, nan_rep=None, encoding=None):
data : fixed length string dtyped array
nan_rep : the storage repr of NaN, optional
encoding : the encoding of the data, optional
errors : handler for encoding errors, default 'strict'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show options and/or point to the python ref for these

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added links to the open docs from NDFrame.to_hdf and pd.read_hdf.

@@ -1946,6 +1946,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
If applying compression use the fletcher32 checksum.
dropna : bool, default False
If true, ALL nan rows will not be written to store.
errors : str, default 'strict'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I know you are adding a few things for the RC so don't need to change anything here, but do we typically document things in the API like this? Wondering if we shouldn't make all of the documented features actual keyword arguments in the call signature rather than tucking them away in kwargs.

FWIW if we have errors here we'd probably want to add encoding as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature should be changed from kwargs to reflect the actual signature. I thought we had an issue for it, but didn't find one. Opened #20903

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll take a stab at that one later

@TomAugspurger TomAugspurger mentioned this pull request May 1, 2018
71 tasks
@TomAugspurger
Copy link
Contributor

Appveyor failure is fixed in #20906

I'm going to merge that before merging this, so that the merge commit is properly tested.

@TomAugspurger TomAugspurger merged commit ade293d into pandas-dev:master May 1, 2018
@TomAugspurger
Copy link
Contributor

Thanks @WillAyd :)

@WillAyd WillAyd deleted the tbl-arg-pass branch May 1, 2018 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"to_hdf()" with "format='table'" ignores encoder "errors" argument.
3 participants