-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH13347 ENH Allow true_values and false_values options in read_excel #14002
GH13347 ENH Allow true_values and false_values options in read_excel #14002
Conversation
@@ -373,6 +388,12 @@ def _parse_cell(cell_contents, cell_typ): | |||
val = int(cell_contents) | |||
if val == cell_contents: | |||
cell_contents = val | |||
elif true_values and cell_contents in true_values: | |||
# GH13347 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are going to need to make a set of these if they r not none
eg this will break with a scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm clear you are asking to change it from:
elif true_values and cell_contents in true_values:
to
elif true_values and cell_contents in set(true_values):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can try but that will fail as well
if true_values=5 or whatever
Current coverage is 85.24% (diff: 100%)@@ master #14002 diff @@
==========================================
Files 140 140
Lines 50563 50563
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43103 43102 -1
- Misses 7460 7461 +1
Partials 0 0
|
@WillAyd ok, the issue is for some other things like this we accept scalar and lists. We need validation on this (in the parser code) to turn a scalar to a list (and a test for this). |
Thanks Jeff. I did however notice that the keyword arguments in read_csv don't do anything to turn scalars into a list. For instance, if I had a line in test.csv that was simply "foo,bar" then running the below would not convert the record foo to a bool.
I have to specify the argument as:
to make it work. I am thinking of two options:
The former would obviously fix the problem at hand but make the two read functions behave slightly differently. The second one would keep them consistent but require more work or an additional change to add the validation you are describing. Let me know which route you prefer. |
they should behave the same. You can for now mimic read_csv and we can come back in a later issue and fix this (as its a bug in read_csv). Or could do it all here. |
da3367f
to
e3d0849
Compare
@@ -412,6 +412,7 @@ Other enhancements | |||
|
|||
- Added documentation to :ref:`I/O<io.dtypes>` regarding the perils of reading in columns with mixed dtypes and how to handle it (:issue:`13746`) | |||
- Raise ImportError for in the sql functions when sqlalchemy is not installed and a connection string is used (:issue:`11920`). | |||
- :func:`read_excel` now supports the true_values and false_values keyword arguments (:issue: `13347`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double back ticks around true_values, false_values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no space (:issue:
13347)
e3d0849
to
49be012
Compare
Thanks for the callout on mixing strings / floats, etc... into the testing scenario. I realized my code isn't working when you mix in floats into the file and seems to be replacing them with np.nan. I'll dig further back in and try to see what's going on |
can you rebase / update? |
closes pandas-dev#13347 Author: willayd <william.ayd@gmail.com> Squashes the following commits: 242a6a6 ENH: Allow true_values and fals_values options in read_excel
49be012
to
15ec67f
Compare
Rebase should now be complete |
@WillAyd Thanks! |
@WillAyd, note that 'foo' is not a scalar, but a string, which is a list of characters. I think your code converts 'f' and 'o' to booleans |
@adrivsh you would have to show an example |
The example is import pandas as pd The result is: Unnamed: 0 0 What i am saying is that python duck-types strings as list of characters, so that false_values= "foo" and false_values = ["f","o","o"] will produce the same result, unless some code prevents that |
Interestingly, import pandas as pd does not transform "f"s and "o"s to False, so I actually have no idea what i am talking about. |
@adrivsh I think that is possibly correct, but is a bit a pitfall (and is not documented very well). The string is regarded as a list of characters, if you pass it a single element list it works as expected:
@adrivsh do you want to open a new issue for that? |
* commit 'v0.19.0rc1-25-ga7469cf': (471 commits) ENH: Add divmod to series and index. (pandas-dev#14208) Fix generator tests to run (pandas-dev#14245) BUG: GH13629 Binned groupby median function with empty bins (pandas-dev#14225) TST/TEMP: fix pyqt to 4.x for plotting tests (pandas-dev#14240) DOC: added example to Series.map showing use of na_action parameter (GH14231) DOC: split docstring into multiple lines in excel.py (pandas-dev#14073) MAINT: Use __module__ in _DeprecatedModule. (pandas-dev#14181) ENH: Allow true_values and false_values options in read_excel (pandas-dev#14002) DOC: fix incorrect example in unstack docstring (GH14206) (pandas-dev#14211) BUG: iloc fails with non lex-sorted MultiIndex pandas-dev#13797 BUG: add check for infinity in __call__ of EngFormatter In gbq.to_gbq allow the DataFrame column order to differ from schema BLD: require cython if tempita is needed DOC: add source links to api docs (pandas-dev#14200) BUG: compat with Stata ver 111 Fix: F999 dictionary key '2000q4' repeated with different values (pandas-dev#14198) BLD: Test for Python 3.5 with C locale BUG: DatetimeTZBlock can't assign values near dst boundary BUG: union_categorical with Series and cat idx BUG: fix str.contains for series containing only nan values ...
git diff upstream/master | flake8 --diff