-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: NA-values in ctors with string dtype #21366
Conversation
```python In [1]: import pandas as pd In [2]: pd.Series([1, 2, None], dtype='str')[2] # None ``` Closes pandas-dev#21083
@@ -164,22 +183,24 @@ def test_constructor_list_like(self): | |||
|
|||
@pytest.mark.parametrize('input_vals', [ | |||
([1, 2]), | |||
([1.0, 2.0, np.nan]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove this case, as it isn't correct. We aren't consistent between Series(... dtype=str)
and .astype(str)
, which is maybe unfortunate...
In [4]: pd.Series([1, 2, None], dtype='str').tolist()
Out[4]: ['1', '2', None]
In [5]: pd.Series([1, 2, None], dtype='str').astype('str').tolist()
Out[5]: ['1', '2', 'None']
when bool(dtype) was False
Did you also add a test case for #21270 ? |
assert_series_equal(result, expected) | ||
def test_constructor_list_str_na(self, string_dtype): | ||
result = Series([1.0, 2.0, np.nan], dtype=string_dtype) | ||
expected = Series(['1.0', '2.0', None], dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the NaN be preserved? (so expected Series(['1.0', '2.0', np.nan], dtype=object)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm apparently assert_series_equal considers those equal
In [11]: pd.Series([1.0, 2.0, np.nan], dtype=str)[2]
Out[11]: nan
I'll be explicit in the test.
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -11,6 +11,8 @@ and bug fixes. We recommend that all users upgrade to this version. | |||
:backlinks: none | |||
|
|||
|
|||
.. _whatsnew_0231.enhancements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from the rebase I suppose
pandas/core/series.py
Outdated
@@ -4054,7 +4054,21 @@ def _try_cast(arr, take_fast_path): | |||
isinstance(subarr, np.ndarray))): | |||
subarr = construct_1d_object_array_from_listlike(subarr) | |||
elif not is_extension_type(subarr): | |||
subarr = np.array(subarr, dtype=dtype, copy=copy) | |||
subarr2 = np.array(subarr, dtype=dtype, copy=copy) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not put any more code here! these routines are already way overloaded.
also I believe you can simply do this by:
(dtype, arr) = cast.infer_dtype_from_array(subarr)
subarray = np.array(arr, dtype=dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
is explicitly passed by the user here. We using that approach wouldn't cast [1.0, 2.0, None]
to ['1.0', '2.0', None]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then pls make a routine in cast, this is not the right place for this. we have so many casting routines pls modify / use existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using that approach wouldn't cast [1.0, 2.0, None] to ['1.0', '2.0', None].
That is actually what happened in 0.22:
In [30]: pd.Series([1.0, 2.0, None], dtype=str).values
Out[30]: array([1.0, 2.0, None], dtype=object)
(but the new behaviour to convert the numbers to strings is certainly an improvement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... OK hold off on merging then. Ideally this would just be a bugfix for that regression :/
Won't have time to look till ~3 hours from now.
Codecov Report
@@ Coverage Diff @@
## master #21366 +/- ##
==========================================
- Coverage 91.89% 91.85% -0.04%
==========================================
Files 153 153
Lines 49579 49569 -10
==========================================
- Hits 45559 45532 -27
- Misses 4020 4037 +17
Continue to review full report at Codecov.
|
@@ -4074,7 +4075,8 @@ def _try_cast(arr, take_fast_path): | |||
isinstance(subarr, np.ndarray))): | |||
subarr = construct_1d_object_array_from_listlike(subarr) | |||
elif not is_extension_type(subarr): | |||
subarr = np.array(subarr, dtype=dtype, copy=copy) | |||
subarr = construct_1d_ndarray_preserving_na(subarr, dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback refactored to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k cool
thanks @TomAugspurger we should prob create an issue about refactor / clean dtypes.cast |
(cherry picked from commit 636dd01)
(cherry picked from commit 636dd01)
Closes #21083