REGR: NA-values in ctors with string dtype #21366

TomAugspurger · 2018-06-07T18:59:58Z

In [1]: import pandas as pd
In [2]: pd.Series([1, 2, None], dtype='str')[2]  # None

Closes #21083

```python In [1]: import pandas as pd In [2]: pd.Series([1, 2, None], dtype='str')[2] # None ``` Closes pandas-dev#21083

TomAugspurger · 2018-06-07T19:01:38Z

pandas/tests/series/test_constructors.py

@@ -164,22 +183,24 @@ def test_constructor_list_like(self):

    @pytest.mark.parametrize('input_vals', [
        ([1, 2]),
-        ([1.0, 2.0, np.nan]),


I had to remove this case, as it isn't correct. We aren't consistent between Series(... dtype=str) and .astype(str), which is maybe unfortunate...

In [4]: pd.Series([1, 2, None], dtype='str').tolist() Out[4]: ['1', '2', None] In [5]: pd.Series([1, 2, None], dtype='str').astype('str').tolist() Out[5]: ['1', '2', 'None']

when bool(dtype) was False

jorisvandenbossche · 2018-06-07T21:16:09Z

Did you also add a test case for #21270 ?

jorisvandenbossche · 2018-06-07T21:18:47Z

pandas/tests/series/test_constructors.py

-            assert_series_equal(result, expected)
+    def test_constructor_list_str_na(self, string_dtype):
+        result = Series([1.0, 2.0, np.nan], dtype=string_dtype)
+        expected = Series(['1.0', '2.0', None], dtype=object)


Shouldn't the NaN be preserved? (so expected Series(['1.0', '2.0', np.nan], dtype=object))

Mmm apparently assert_series_equal considers those equal

In [11]: pd.Series([1.0, 2.0, np.nan], dtype=str)[2] Out[11]: nan

I'll be explicit in the test.

jorisvandenbossche · 2018-06-07T22:11:24Z

doc/source/whatsnew/v0.23.1.txt

@@ -11,6 +11,8 @@ and bug fixes. We recommend that all users upgrade to this version.
    :backlinks: none


+.. _whatsnew_0231.enhancements:


leftover from the rebase I suppose

jreback · 2018-06-08T11:15:21Z

pandas/core/series.py

@@ -4054,7 +4054,21 @@ def _try_cast(arr, take_fast_path):
                                           isinstance(subarr, np.ndarray))):
                subarr = construct_1d_object_array_from_listlike(subarr)
            elif not is_extension_type(subarr):
-                subarr = np.array(subarr, dtype=dtype, copy=copy)
+                subarr2 = np.array(subarr, dtype=dtype, copy=copy)
+


do not put any more code here! these routines are already way overloaded.

also I believe you can simply do this by:

(dtype, arr) = cast.infer_dtype_from_array(subarr) subarray = np.array(arr, dtype=dtype)

dtype is explicitly passed by the user here. We using that approach wouldn't cast [1.0, 2.0, None] to ['1.0', '2.0', None].

then pls make a routine in cast, this is not the right place for this. we have so many casting routines pls modify / use existing ones.

When using that approach wouldn't cast [1.0, 2.0, None] to ['1.0', '2.0', None].

That is actually what happened in 0.22:

In [30]: pd.Series([1.0, 2.0, None], dtype=str).values Out[30]: array([1.0, 2.0, None], dtype=object)

(but the new behaviour to convert the numbers to strings is certainly an improvement)

Ahh... OK hold off on merging then. Ideally this would just be a bugfix for that regression :/

Won't have time to look till ~3 hours from now.

codecov · 2018-06-08T13:06:53Z

Codecov Report

Merging #21366 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21366      +/-   ##
==========================================
- Coverage   91.89%   91.85%   -0.04%     
==========================================
  Files         153      153              
  Lines       49579    49569      -10     
==========================================
- Hits        45559    45532      -27     
- Misses       4020     4037      +17

Flag	Coverage Δ
#multiple	`90.25% <100%> (-0.04%)`	⬇️
#single	`41.87% <100%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/series.py	`94.15% <100%> (-0.03%)`	⬇️
pandas/plotting/_core.py	`82.39% <0%> (-1.15%)`	⬇️
pandas/core/dtypes/missing.py	`91.95% <0%> (-0.58%)`	⬇️
pandas/core/reshape/pivot.py	`96.97% <0%> (-0.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f6ea67...4de6f7a. Read the comment docs.

TomAugspurger · 2018-06-08T13:07:41Z

pandas/core/series.py

@@ -4074,7 +4075,8 @@ def _try_cast(arr, take_fast_path):
                                           isinstance(subarr, np.ndarray))):
                subarr = construct_1d_object_array_from_listlike(subarr)
            elif not is_extension_type(subarr):
-                subarr = np.array(subarr, dtype=dtype, copy=copy)
+                subarr = construct_1d_ndarray_preserving_na(subarr, dtype,


@jreback refactored to this.

jreback · 2018-06-08T16:27:33Z

thanks @TomAugspurger

we should prob create an issue about refactor / clean dtypes.cast

(cherry picked from commit 636dd01)

REGR: NA-values in ctors with string dtype

d07b238

```python In [1]: import pandas as pd In [2]: pd.Series([1, 2, None], dtype='str')[2] # None ``` Closes pandas-dev#21083

TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 7, 2018

TomAugspurger added this to the 0.23.1 milestone Jun 7, 2018

TomAugspurger commented Jun 7, 2018

View reviewed changes

TomAugspurger added 2 commits June 7, 2018 15:49

Compat for old numpy

d2585e3

when bool(dtype) was False

Additional tests

bcc993c

jorisvandenbossche reviewed Jun 7, 2018

View reviewed changes

TomAugspurger added 2 commits June 7, 2018 16:39

Additional test fixups

0d7c853

Merge remote-tracking branch 'upstream/master' into dtype-str-cast

a94d399

jorisvandenbossche reviewed Jun 7, 2018

View reviewed changes

jorisvandenbossche approved these changes Jun 7, 2018

View reviewed changes

jreback requested changes Jun 8, 2018

View reviewed changes

Refactored

3d81d5d

Merge remote-tracking branch 'upstream/master' into dtype-str-cast

4de6f7a

TomAugspurger commented Jun 8, 2018

View reviewed changes

jreback approved these changes Jun 8, 2018

View reviewed changes

jreback merged commit 636dd01 into pandas-dev:master Jun 8, 2018

TomAugspurger deleted the dtype-str-cast branch June 8, 2018 18:44

TomAugspurger added the Needs Backport label Jun 12, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018

REGR: NA-values in ctors with string dtype (pandas-dev#21366)

c92d2f9

(cherry picked from commit 636dd01)

TomAugspurger added a commit that referenced this pull request Jun 12, 2018

REGR: NA-values in ctors with string dtype (#21366)

e841daa

(cherry picked from commit 636dd01)

TomAugspurger removed the Needs Backport label Jun 12, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

REGR: NA-values in ctors with string dtype (pandas-dev#21366)

50aec3f

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

REGR: NA-values in ctors with string dtype (pandas-dev#21366)

b34bf1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: NA-values in ctors with string dtype #21366

REGR: NA-values in ctors with string dtype #21366

TomAugspurger commented Jun 7, 2018

TomAugspurger Jun 7, 2018

jorisvandenbossche commented Jun 7, 2018 •

edited

Loading

jorisvandenbossche Jun 7, 2018

TomAugspurger Jun 7, 2018

jorisvandenbossche Jun 7, 2018

jreback Jun 8, 2018

TomAugspurger Jun 8, 2018

jreback Jun 8, 2018

jorisvandenbossche Jun 8, 2018

TomAugspurger Jun 8, 2018

codecov bot commented Jun 8, 2018 •

edited

Loading

TomAugspurger Jun 8, 2018

jreback Jun 8, 2018

jreback commented Jun 8, 2018

		@@ -11,6 +11,8 @@ and bug fixes. We recommend that all users upgrade to this version.
		:backlinks: none


		.. _whatsnew_0231.enhancements:

REGR: NA-values in ctors with string dtype #21366

REGR: NA-values in ctors with string dtype #21366

Conversation

TomAugspurger commented Jun 7, 2018

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 8, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 8, 2018

jorisvandenbossche commented Jun 7, 2018 •

edited

Loading

codecov bot commented Jun 8, 2018 •

edited

Loading