-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Error when creating sparse dataframe with nan column label #8822
Conversation
4e10a64
to
66a4714
Compare
|
@jreback, I totally agree with you. Except that this is not allowed in DataFrame now -- you can create a DataFrame with nan columns, but not a SparseDataFrame. However, I am working on a I would argue that if this shouldn't be done, it should be explicitly forbidden. E.g. so that people don't write tests creating using DataFrames with nan column names. |
it can be done. just not like you are doing it. You can do it from a dict only (or set the columns explicity). Its implicity allowed, even though if you try to index with more than one nan in an index it will fail. I know their are some tests / behavior which allow this. Its a problem. Not sure how much work it would be to either not disallow it. I agree with your sentiments. So not allowing expansion of this (have nixed a couple of pr's that tried to do this). it is a can of worms. |
OK, I will soon commit a pull request that fails a test due to this. Maybe we can discuss it further there. The issue can probably be avoided with a simple test for this edge case. |
ok, sure |
@jreback This might be necessary after all, for the relatively normal following case:
|
that's an ok case |
35ba20e
to
e9ed3d8
Compare
@jreback Not sure how to get sparse |
the usual way to do this is to stringify the nans, e.g. use |
But then if one of the values is 'nan' (in string form), then np.nan's and 'nan's would get confused. I think a
would be more robust. |
@jreback I'm wondering how to proceed with this. The basic issue is that w/o this PR, the following raises an error:
What are your thoughts? Keep as is? Or merge, but using dicts instead of a dataframe to store column information? (and if so --- because it's more lightweight?) |
@artemyk I am excited you are working on this. As we need a person really interested in sparese! I haven't had a chance to look at how best to do this. Will get back to you next week. |
@jreback OK, sounds good! |
@artemyk can you rebase this and I'll take a look...... |
5d04093
to
c776988
Compare
@jreback Rebased |
@jreback ? |
@@ -1663,6 +1663,11 @@ def test_as_blocks(self): | |||
self.assertEqual(list(df_blocks.keys()), ['float64']) | |||
assert_frame_equal(df_blocks['float64'], df) | |||
|
|||
def test_nan_columnname(self): | |||
nan_colname = DataFrame(Series(1.0,index=[0]),columns=[nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue as a comment here
pls add a release note (use this PR number as the issue number) |
c776988
to
b0e5ee3
Compare
Support for nan columns Fix Trigger Travis CI jreback fixes Release note update
b0e5ee3
to
7879205
Compare
@jreback Ready to merge, I think |
BUG: Error when creating sparse dataframe with nan column label
thanks! |
Right now the following raises an exception:
This is because sparse dataframes use a dictionary to store information about columns, with the column label as the key.
nan
's do not equal themselves and create problems as dictionary keys. This avoids the issue by uses a dataframe to store this information.