BUG: fix DataFrame constructor w named Series #9237

TomAugspurger · 2015-01-13T02:03:58Z

Problem was passing Series w/ a name to DataFrame w/ the columns kwarg.

Before:

In [55]: x = pd.Series(range(5), name=1)

In [56]: y = pd.Series(range(5), name=0)

In [57]: pd.DataFrame(x, columns=[0])
Out[57]:
Empty DataFrame
Columns: [0]
Index: []

In [58]: pd.DataFrame(x, columns=[1])
Out[58]:
   1
0  0
1  1
2  2
3  3
4  4

In [59]: pd.DataFrame(y, columns=[0])
Out[59]:
   0
0  0
1  1
2  2
3  3
4  4

In [60]: pd.DataFrame(y, columns=[1])
Out[60]:
   1
0  0
1  1
2  2
3  3
4  4

after

In [1]: x = pd.Series(range(5), name=1)

In [2]: y = pd.Series(range(5), name=0)

In [4]: pd.DataFrame(x, columns=[0])
Out[4]: 
   0
0  0
1  1
2  2
3  3
4  4

In [5]: pd.DataFrame(y, columns=[1])
Out[5]: 
   1
0  0
1  1
2  2
3  3
4  4

There were two intertwined problems

we checked if getattr(data, 'name', None):, which returned False when data.name was Falseish (like 0). I now compare it directly against None.
If data has a name and the columns kwarg is specified, the constructor returned an Empty DataFrame w/ the column specified in columns. Now, we do what's documented:

The result will be a DataFrame with the same index as the input Series, and with one column whose name is the original name of the Series (only if no other column name provided).

jreback · 2015-01-13T02:05:54Z

this is an api change

i will have to look at this

TomAugspurger · 2015-01-13T02:06:59Z

Yep, I wasn't sure where to document it in whatsnew. I can expand it.

I think the docs are pretty clear that this was the intended behavior..

jorisvandenbossche · 2015-01-13T08:07:49Z

How I always understood it, it is the y case (pd.DataFrame(y, columns=[1]) not giving an empty dataframe) that was buggy and not the x case (pd.DataFrame(x, columns=[0]) giving an empty frame). It was sometimes odd or surpising behaviour, but giving a series/frame to DataFrame is regarded as a reindex.

I wanted to say that this is in line with how a DataFrame itself is treated when given to DataFrame, but it appears this is still somewhat different (not giving an empty frame, but one with NaNs):

In [42]: x = pd.Series(range(5), name=1)

In [43]: pd.DataFrame(x.to_frame(), columns=[0])
Out[43]: 
    0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN

jreback · 2015-01-13T11:18:20Z

see more discusion on #7893

this AFAIK was a longstanding behavior which I think I broke in 0.12-13 . I agree with @jorisvandenbossche here

This should return a column of NaN if the passed name in the columns does not match, e.g. its a reindex. (the bug is its returning an empty frame). In fact the undelrying code should do exactly to_frame() as this handles the edge cases.

jreback · 2015-01-13T11:20:55Z

I think we should have a mini section in basics.rst to show conversions of series -> dataframe (and obviously in the 0.16.0 whatsnew)

TomAugspurger · 2015-01-13T13:08:51Z

OK, so I'll repurpose this PR to make to_frame() and the constructor consistent (both return a column of NaNs) and rewrite the docs.

jorisvandenbossche · 2015-01-13T13:16:34Z

Just for the docs, if you want to give a series as input to DataFrame, but do want to overwrite the series name (which is a valid use case), what is the recommended way to do? As first changing the series name, or afterwards renaming the dataframe is somewhat cumbersome.
I suppose giving it as a dict:

In [59]: s = pd.Series([1,2,3], index=[10,11,12], name='test')

In [60]: pd.DataFrame(s)
Out[60]: 
    test
10     1
11     2
12     3

In [61]: pd.DataFrame(s, columns=['col'])
Out[61]: 
Empty DataFrame
Columns: [col]
Index: []

In [62]: pd.DataFrame({'col': s})
Out[62]: 
    col
10    1
11    2
12    3

jorisvandenbossche · 2015-01-13T13:17:18Z

OK, but of course for just this, using to_frame is the simplest:

In [63]: s.to_frame(name='col')
Out[63]: 
    col
10    1
11    2
12    3

jreback · 2015-01-18T20:49:02Z

@TomAugspurger can you update the top section. I don't think this is correct. The columns argument will cause a reindex of the input (after the column name is set by name). So if they different you would get a column of nans. What I think is here is inconsistent with that.

jreback · 2015-03-05T23:34:40Z

@TomAugspurger can you rebase this. and confirm (just change the top section) of the existing and new behavior (may also need an example in the api breaking section of the same).

jreback · 2015-03-07T23:13:46Z

@TomAugspurger ?

TomAugspurger · 2015-03-08T04:16:06Z

@jreback so the expected behavior is

In [24]: y = pd.Series(range(5), name=0)

In [25]: pd.DataFrame(y, columns=[1])
Out[25]: 
   1
0  NaN
1  NaN
2  NaN
3  NaN
4  NaN

?

TomAugspurger · 2015-03-08T04:19:43Z

IMO, it should be identical to what you get from a dict

In [29]: pd.DataFrame({y.name: y.values}, columns=[1])  # y.name is 0
Out[29]: 
Empty DataFrame
Columns: [1]
Index: []

So that should also be NaNs? Or no since I didn't provide an index?

jreback · 2015-03-08T14:14:36Z

@TomAugspurger

I think [24/25] is the correct return. (0.15.2 does what you have for 24/25). [29] is just wrong (same in master).

The canonical way to do this is: y.to_frame(name=1) which in effect is: DataFrame({ name : y }), where name is y.name or the new_name if provided.

This has to be a reindex, e.g. construct the dict using the current name THEN reindex. (its done like this for a dictionary construction, but it is not entirely correct; the [1] column in your example should still exist, eg. [24/25] and NOT [29])

Imagine this scenario:

In [1]: DataFrame({'y1' : Series(range(5),name='foo'), 'y2' : Series(range(5),name='bar') }, columns=['y2'])
Out[1]: 
   y2
0   0
1   1
2   2
3   3
4   4

So the columns are selecting (e.g. reindexing) on the passed dictionary keys, if you JUST pass a single Series, then it must be { s.name : s } so the same thing should happen

TL;DR; but makes sense?

TomAugspurger · 2015-03-08T14:21:29Z

Yep, it definitely is a reindex. We'll go with [24]/[25].

TomAugspurger · 2015-03-08T14:27:54Z

And I'm comfortable saying that

In [16]: pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}, columns=["C"])

Returning an Empty DataFrame instead of NaNs was a bug (so this isn't an API change)

jreback · 2015-03-08T14:28:46Z

doc/source/whatsnew/v0.16.0.txt

+
+
+- Fixed bug with DataFrame constructor when passed a Series with a
+name and the `columns` keyword argument.


add the issue number here

jreback · 2015-03-08T14:29:21Z

@TomAugspurger agree with your [16]. thxs

TomAugspurger · 2015-03-08T14:45:42Z

This is breaking some tests unfortunately. When constructing from a dict of {col: Series} with differing indicies and specifying columns. Here's what I do now

In [2]: s1 = pd.Series([1, 2, 3], name='a')

In [3]: s2 = pd.Series([4, 5, 6], name='b', index=[2, 3, 4])

In [5]: pd.DataFrame({'a': s1, 'b': s2}, columns=['a'])
Out[5]: 
    a
0   1
1   2
2   3
3 NaN
4 NaN

Before it was

In [5 master]: pd.DataFrame({'a': s1, 'b': s2}, columns=['a'])
Out[5 master]: 
    a
0   1
1   2
2   3

Since we filtered out b before determining what the index was. I think it's good that we clarifying all these constructors and make them consistent, but it's feeling pretty disrupting.

jreback · 2015-03-08T14:51:08Z

@TomAugspurger hmm, I see what you mean. I don't think this should change though. We are only reindexing on the columns (as opposed to creating the frame first from the ENTIRE dict, then reindexing)

Hmm I guess that is why the [29] is the way it is.

Ok, why don't you investigate and see what makes the most sense.

TomAugspurger · 2015-03-08T14:57:12Z

Is it cool to push this all off to 0.17? I don't have much time today and I need to squash a weird bug in the plot accessors. Is there anything that is obviously broken and should be fixed today? I'd say that this is obviously wrong.

In [1]: y = pd.Series(range(5), name=0)

In [2]: z = pd.Series(range(5), name=1)

In [3]: pd.DataFrame(y, columns=[2])
Out[3]: 
   2
0  0
1  1
2  2
3  3
4  4

In [4]: pd.DataFrame(z, columns=[2])
Out[4]: 
Empty DataFrame
Columns: [2]
Index: []

I jsut don't want to make anything harder down the road.

jreback · 2015-03-08T14:58:34Z

sure let's push thx!

Pushing for now [ci skip]

jreback · 2015-05-09T16:10:42Z

@TomAugspurger ought to re-examine this for 0.17.0. pls rebase when you have a chance

jreback · 2015-07-28T22:03:10Z

@TomAugspurger what's status?

TomAugspurger · 2015-07-28T22:11:06Z

Just reread through. I'll have something by the end of next week.

jreback · 2015-08-26T01:44:43Z

@TomAugspurger pushing, but if you get to it next week or 2 can put in.

jreback · 2015-10-18T14:01:46Z

@TomAugspurger status?

TomAugspurger · 2015-10-19T12:24:52Z

Won't really have time to work on this in the short term. Better to close or leave open?

jreback · 2015-10-19T13:10:55Z

let's close for now.

jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 13, 2015

jreback added this to the 0.16.0 milestone Jan 13, 2015

jreback reviewed Mar 8, 2015
View reviewed changes

jreback modified the milestones: Next Major Release, 0.16.0 Mar 8, 2015

BUG: fix DataFrame constructor w named Series

12d39c6

Pushing for now [ci skip]

TomAugspurger force-pushed the DataFrame-from-Series-names branch from d4e02d9 to 12d39c6 Compare March 8, 2015 15:02

jreback mentioned this pull request Apr 20, 2015

DataFrame constructor produces NaN values when passing df as data param and an index containing values that are not present in passed in df #9951

Closed

TomAugspurger mentioned this pull request Apr 28, 2015

BUG: Inconsistency in converting Series to Dataframe when columns specified #10006

Closed

jorisvandenbossche modified the milestones: 0.17.0, Next Major Release Jun 2, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Aug 26, 2015

jreback closed this Oct 19, 2015

jorisvandenbossche mentioned this pull request Jun 10, 2016

DataFrame(recarray, columns=MultiIndex) disregards input data, gives empty DataFrame #13415

Closed

jorisvandenbossche added the Closed PR label Jul 21, 2016

jorisvandenbossche modified the milestones: No action, Next Major Release Jul 21, 2016

jorisvandenbossche mentioned this pull request Aug 19, 2016

Index.unique() should always return an Index object of the same type #13395

Closed

TomAugspurger deleted the DataFrame-from-Series-names branch April 5, 2017 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix DataFrame constructor w named Series #9237

BUG: fix DataFrame constructor w named Series #9237

TomAugspurger commented Jan 13, 2015

jreback commented Jan 13, 2015

TomAugspurger commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

TomAugspurger commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jreback commented Jan 18, 2015

jreback commented Mar 5, 2015

jreback commented Mar 7, 2015

TomAugspurger commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback Mar 8, 2015

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

jreback commented May 9, 2015

jreback commented Jul 28, 2015

TomAugspurger commented Jul 28, 2015

jreback commented Aug 26, 2015

jreback commented Oct 18, 2015

TomAugspurger commented Oct 19, 2015

jreback commented Oct 19, 2015



		- Fixed bug with DataFrame constructor when passed a Series with a
		name and the `columns` keyword argument.

BUG: fix DataFrame constructor w named Series #9237

BUG: fix DataFrame constructor w named Series #9237

Conversation

TomAugspurger commented Jan 13, 2015

jreback commented Jan 13, 2015

TomAugspurger commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

TomAugspurger commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jorisvandenbossche commented Jan 13, 2015

jreback commented Jan 18, 2015

jreback commented Mar 5, 2015

jreback commented Mar 7, 2015

TomAugspurger commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback Mar 8, 2015

Choose a reason for hiding this comment

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

TomAugspurger commented Mar 8, 2015

jreback commented Mar 8, 2015

jreback commented May 9, 2015

jreback commented Jul 28, 2015

TomAugspurger commented Jul 28, 2015

jreback commented Aug 26, 2015

jreback commented Oct 18, 2015

TomAugspurger commented Oct 19, 2015

jreback commented Oct 19, 2015