-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: inconsistent naming convention for read_excel column selection (#4988) #16488
Conversation
FYI, just discussing a bug related to usecols #15316, might be easiest to work that change into your refactoring, or can be separate, cc @stanleyguan |
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -35,6 +35,7 @@ Other Enhancements | |||
- ``RangeIndex.append`` now returns a ``RangeIndex`` object when possible (:issue:`16212`) | |||
- :func:`to_pickle` has gained a protocol parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.html#data-stream-format>`__ | |||
- :func:`api.types.infer_dtype` now infers decimals. (:issue: `15690`) | |||
- :func:`read_excel` now allows a column character list (E.G. ['A', 'C', 'D']) with the ``usecols`` parameter (:issue:`4988`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are listed this in deprecations so this is unecessary
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -60,6 +61,7 @@ Other API Changes | |||
Deprecations | |||
~~~~~~~~~~~~ | |||
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with to_excel() (:issue:`10559`). | |||
- :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with other read_ functions (:issue:`4988`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.read_*
functions
* If string then indicates comma separated list of column names and | ||
column ranges (e.g. "A:E" or "A,C,E:F") | ||
parse_cols : int or list, default None | ||
.. deprecated:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, didn't know sphinx had this deprecated tag. @TomAugspurger @jorisvandenbossche should maybe change other deprecations to use this if it looks nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appears as:
parse_cols : int or list, default None
Deprecated since version 0.21.0: Use usecols instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.read_excel.html Has an example from sheetname. Looks ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive
pandas/tests/io/test_excel.py
Outdated
dfref = self.get_csv_refdf('test1') | ||
dfref = dfref.reindex(columns=['A', 'B', 'C']) | ||
df1 = self.get_exceldf('test1', 'Sheet1', index_col=0, usecols=3) | ||
df2 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you need to change ALL tests to use the new one (usecols), except for a single test to actually hit the deprecation.
pandas/tests/io/test_excel.py
Outdated
|
||
tm.assert_frame_equal(df1, dfref, check_names=False) | ||
tm.assert_frame_equal(df2, dfref, check_names=False) # backward compat | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, new tests for the new functionality.
Codecov Report
@@ Coverage Diff @@
## master #16488 +/- ##
=========================================
Coverage ? 90.4%
=========================================
Files ? 161
Lines ? 51038
Branches ? 0
=========================================
Hits ? 46139
Misses ? 4899
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16488 +/- ##
==========================================
- Coverage 90.79% 90.43% -0.37%
==========================================
Files 161 161
Lines 51063 51046 -17
==========================================
- Hits 46363 46162 -201
- Misses 4700 4884 +184
Continue to review full report at Codecov.
|
* If string then indicates comma separated list of column names and | ||
column ranges (e.g. "A:E" or "A,C,E:F") | ||
parse_cols : int or list, default None | ||
.. deprecated:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive
pandas/io/excel.py
Outdated
@@ -312,11 +327,6 @@ def _range2cols(areas): | |||
>>> _range2cols('A,C,Z:AB') | |||
[0, 2, 25, 26, 27] | |||
""" | |||
def _excel2num(x): | |||
"Convert Excel column name like 'AB' to 0-based column index" | |||
return reduce(lambda s, a: s * 26 + ord(a) - ord('A') + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are changing this code? does this involve the deprecation somehow? if you are cleaning/fixing, pls do in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changed to allow passing list of strings, similar to other read_*
functions, and keep the read_excel
"superpower" to pass a string that is parsed as mentioned in the original issue (#4988, point 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that's fine, but needs to be in a separate PR from the deprecation change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, opened #16510 for this functionality. will re-submit with only the argument change (parse_cols -> usecols).
|
||
def test_parse_cols_str(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave the original tests structure (sure you can change the name to conform), but don't change the tests (in THIS PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are back to original but with changed function & kwarg names.
@abarber4gh can you rebase, now that the excel tests are running? |
- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.
…n selection (#4988) update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’. refactor column use column parsing to only occur once per sheet. updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().
add `check_stacklevel=False` to `test_excel_oldindex_format()`
Codecov Report
@@ Coverage Diff @@
## master #16488 +/- ##
==========================================
+ Coverage 90.79% 90.93% +0.14%
==========================================
Files 161 161
Lines 51063 49267 -1796
==========================================
- Hits 46363 44802 -1561
+ Misses 4700 4465 -235
Continue to review full report at Codecov.
|
Replaces most uses of implicit global state from matplotlib in test_datetimelike.py. This was potentially causing random failures where a figure expected to be on a new, blank figure would instead plot on an existing axes (that's the guess at least).
* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation
Splits extra information about the license and copyright holders to AUTHORS.md.
* COMPAT: numpy 1.13 test compat * CI: fix doc build to 1.12
- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.
…n selection (#4988) update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’. refactor column use column parsing to only occur once per sheet. updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().
add `check_stacklevel=False` to `test_excel_oldindex_format()`
…o issue#4988 * 'issue#4988' of https://github.com/abarber4gh/pandas: add `deprecate_kwarg` from `_decorators` add `check_stacklevel=False` to `test_excel_oldindex_format()` removed excess blank line. change parse_cols to usecols change tests keyword from parse_cols to usecol. no message ENH: inconsistent naming convention for read_csv and read_excel column selection (#4988) implement changes request in PR#16488 - removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning. rebased
can you rebase / update according to comments |
closing as stale. pls ping to reopen if you want to continue. |
update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’,
added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test
cases that use ‘parse_cols’.
refactor column use column parsing to only occur once per sheet.
updated whats new with deprecated parse_col argument.
closes ENH: inconsistent naming convention for read_csv and read_excel column selection #4988
tests added / passed
passes
git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry
packers.* BENCHMARKS NOT SIGNIFICANTLY CHANGED.