-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add dtype= option to python parser #14295
API: add dtype= option to python parser #14295
Conversation
Current coverage is 85.22% (diff: 100%)@@ master #14295 diff @@
==========================================
Files 143 143
Lines 50804 50833 +29
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43292 43324 +32
+ Misses 7512 7509 -3
Partials 0 0
|
result[c] = cvals | ||
if verbose and na_count: | ||
print('Filled %d NA values in column %s' % (na_count, str(c))) | ||
return result | ||
|
||
def _convert_types(self, values, na_values, try_num_bool=True): | ||
def _infer_types(self, values, na_values, try_num_bool=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, can you add a docstring here?
not interpret dtype. | ||
Use `str` or `object` to preserve and not interpret dtype. | ||
If converters are specified, they will be applied AFTER | ||
dtype conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that exercises this assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice to me! (added two minor comments)
values, conv_f, na_values, | ||
col_na_values, col_na_fvalues) | ||
else: | ||
try_num_bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just this:
try_num_bool = cast_type and is_object_dtype(cast_type)
@@ -1355,6 +1392,23 @@ def _convert_types(self, values, na_values, try_num_bool=True): | |||
|
|||
return result, na_count | |||
|
|||
def _cast_types(self, values, cast_type, column): | |||
""" cast column to type specified in dtypes= param """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I know this isn't done too much for internal functions,
But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)
de4660d
to
ab7e1e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, looks good to me!
|
||
.. ipython:: python | ||
|
||
from io import StringIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not work on python 2. Should check how it is done in other places in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pandas.compat import StringIO
is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the io
module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)
If converters are specified, they will be applied INSTEAD | ||
of dtype conversion. | ||
|
||
.. versionadded:: 0.20.0 support for the Python parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take
8153feb
to
035c921
Compare
I think I've got all the feedback so far worked in - any more comments? @jorisvandenbossche, @gfyoung, @jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well
Specifying ``dtype`` with ``engine`` other than 'c' raises a | ||
``ValueError``. | ||
.. versionadded:: 0.20.0 support for the Python parser. | ||
The ``dtype`` option is supported by the 'python' engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a blank line here (to avoid warnings)
@@ -32,6 +32,14 @@ Other enhancements | |||
|
|||
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would make a sub-section
if conv: | ||
if col_dtype is not None: | ||
warnings.warn(("Both a converter and dtype were specified " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was added here:e0d1606
7c703fe
to
47669d3
Compare
|
||
The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns | ||
is now supported with the ``'python'`` engine. See the :ref:`io docs <io.dtypes>` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue references here
…into textreader-dtype Conflicts: pandas/io/tests/parser/dtypes.py
@chris-b1 Thanks! |
git diff upstream/master | flake8 --diff
Ultimately I'm working towards #8212 (types in excel parser), which should be pretty straightforward after this.
Right now the tests are moved from
c_parser_only.py
, may need to add some morecc @gfyoung