Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: NA-values in ctors with string dtype #21366

Merged
merged 7 commits into from
Jun 8, 2018

Conversation

TomAugspurger
Copy link
Contributor

In [1]: import pandas as pd
In [2]: pd.Series([1, 2, None], dtype='str')[2]  # None

Closes #21083

```python
In [1]: import pandas as pd
In [2]: pd.Series([1, 2, None], dtype='str')[2]  # None

```

Closes pandas-dev#21083
@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 7, 2018
@TomAugspurger TomAugspurger added this to the 0.23.1 milestone Jun 7, 2018
@@ -164,22 +183,24 @@ def test_constructor_list_like(self):

@pytest.mark.parametrize('input_vals', [
([1, 2]),
([1.0, 2.0, np.nan]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove this case, as it isn't correct. We aren't consistent between Series(... dtype=str) and .astype(str), which is maybe unfortunate...

In [4]: pd.Series([1, 2, None], dtype='str').tolist()
Out[4]: ['1', '2', None]

In [5]: pd.Series([1, 2, None], dtype='str').astype('str').tolist()
Out[5]: ['1', '2', 'None']

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 7, 2018

Did you also add a test case for #21270 ?

assert_series_equal(result, expected)
def test_constructor_list_str_na(self, string_dtype):
result = Series([1.0, 2.0, np.nan], dtype=string_dtype)
expected = Series(['1.0', '2.0', None], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the NaN be preserved? (so expected Series(['1.0', '2.0', np.nan], dtype=object))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm apparently assert_series_equal considers those equal

In [11]: pd.Series([1.0, 2.0, np.nan], dtype=str)[2]
Out[11]: nan

I'll be explicit in the test.

@@ -11,6 +11,8 @@ and bug fixes. We recommend that all users upgrade to this version.
:backlinks: none


.. _whatsnew_0231.enhancements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from the rebase I suppose

@@ -4054,7 +4054,21 @@ def _try_cast(arr, take_fast_path):
isinstance(subarr, np.ndarray))):
subarr = construct_1d_object_array_from_listlike(subarr)
elif not is_extension_type(subarr):
subarr = np.array(subarr, dtype=dtype, copy=copy)
subarr2 = np.array(subarr, dtype=dtype, copy=copy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not put any more code here! these routines are already way overloaded.

also I believe you can simply do this by:

(dtype, arr) = cast.infer_dtype_from_array(subarr)
subarray = np.array(arr, dtype=dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype is explicitly passed by the user here. We using that approach wouldn't cast [1.0, 2.0, None] to ['1.0', '2.0', None].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then pls make a routine in cast, this is not the right place for this. we have so many casting routines pls modify / use existing ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using that approach wouldn't cast [1.0, 2.0, None] to ['1.0', '2.0', None].

That is actually what happened in 0.22:

In [30]: pd.Series([1.0, 2.0, None], dtype=str).values
Out[30]: array([1.0, 2.0, None], dtype=object)

(but the new behaviour to convert the numbers to strings is certainly an improvement)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... OK hold off on merging then. Ideally this would just be a bugfix for that regression :/

Won't have time to look till ~3 hours from now.

@codecov
Copy link

codecov bot commented Jun 8, 2018

Codecov Report

Merging #21366 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21366      +/-   ##
==========================================
- Coverage   91.89%   91.85%   -0.04%     
==========================================
  Files         153      153              
  Lines       49579    49569      -10     
==========================================
- Hits        45559    45532      -27     
- Misses       4020     4037      +17
Flag Coverage Δ
#multiple 90.25% <100%> (-0.04%) ⬇️
#single 41.87% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/series.py 94.15% <100%> (-0.03%) ⬇️
pandas/plotting/_core.py 82.39% <0%> (-1.15%) ⬇️
pandas/core/dtypes/missing.py 91.95% <0%> (-0.58%) ⬇️
pandas/core/reshape/pivot.py 96.97% <0%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f6ea67...4de6f7a. Read the comment docs.

@@ -4074,7 +4075,8 @@ def _try_cast(arr, take_fast_path):
isinstance(subarr, np.ndarray))):
subarr = construct_1d_object_array_from_listlike(subarr)
elif not is_extension_type(subarr):
subarr = np.array(subarr, dtype=dtype, copy=copy)
subarr = construct_1d_ndarray_preserving_na(subarr, dtype,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback refactored to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k cool

@jreback jreback merged commit 636dd01 into pandas-dev:master Jun 8, 2018
@jreback
Copy link
Contributor

jreback commented Jun 8, 2018

thanks @TomAugspurger

we should prob create an issue about refactor / clean dtypes.cast

@TomAugspurger TomAugspurger deleted the dtype-str-cast branch June 8, 2018 18:44
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018
TomAugspurger added a commit that referenced this pull request Jun 12, 2018
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants