-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: SparseSeries init from dict fixes #16906
Conversation
Tests copied/adapted from |
Codecov Report
@@ Coverage Diff @@
## master #16906 +/- ##
==========================================
- Coverage 90.99% 90.97% -0.02%
==========================================
Files 161 161
Lines 49293 49292 -1
==========================================
- Hits 44854 44844 -10
- Misses 4439 4448 +9
Continue to review full report at Codecov.
|
pandas/tests/sparse/test_series.py
Outdated
|
||
data = A(('col%s' % i, np.random.random()) for i in range(12)) | ||
s = SparseSeries(data) | ||
tm.assert_numpy_array_equal(s.values.values, np.array(list(data.values()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use assert_sp_series_equal (you can pass check_list=False) and then add an assert about the column ordering
pandas/tests/sparse/test_series.py
Outdated
return dict(zip((constructor(x) for x in dates_as_str), values)) | ||
|
||
data_datetime64 = create_data(np.datetime64) | ||
data_datetime = create_data(lambda x: datetime.strptime(x, '%Y-%m-%d')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test
pandas/tests/sparse/test_series.py
Outdated
expected = SparseSeries([x[1] for x in _d], | ||
index=pd.Index([x[0] for x in _d], | ||
tupleize_cols=False)) | ||
ser = ser.reindex(index=expected.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result=
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -179,6 +179,7 @@ Sparse | |||
^^^^^^ | |||
|
|||
|
|||
- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index`` (:issue:`16905`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index=
kwarg
pandas/tests/sparse/test_series.py
Outdated
|
||
|
||
def test_constructor_dict(): | ||
d = {'a': 0., 'b': 1., 'c': 2.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might be able to move some of these into from pandas.tests.series.test_api import SharedWithSparse
whech we already import (rather than directly copying them).
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -179,6 +179,7 @@ Sparse | |||
^^^^^^ | |||
|
|||
|
|||
- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index=`` kwarg (:issue:`16905`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't matter whether index
is passed in, why mention it in the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a reference to the two issues fixed. With, the result was invalid; without, it crashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, though ultimately instantiating from dict
just didn't work at all though, which could be considered a single bug (also you're only referencing one issue here). Note that without further context, people won't be aware of that difference (whether it was incorrect or crashed), so it is preferable to be concise.
@kernc lmk if you are able to share some test code with Series, can always do a followup. |
Hello @kernc! Thanks for updating the PR.
Comment last updated on July 17, 2017 at 21:25 Hours UTC |
5dcec57
to
991b99a
Compare
Was able to share some test code with Series with as little as possible effort. Not sure if OK, though. |
pandas/tests/series/test_api.py
Outdated
|
||
result = self.Series(d, index=['b', 'c', 'd', 'a']) | ||
expected = self.Series([1, 2, np.nan, 0], index=['b', 'c', 'd', 'a']) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't check sparseness. I might modify assert_series_equal to dispatch to assert_sp_series_equal if both are SparseSeries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of my comment below, I'd rather that we keep sparse equality and Series
equality checks separate. Perhaps if we could write a function like:
def _check_series_equal(self, left, right):
...
that dispatches to tm.assert_series_equal
OR tm.assert_sp_series_equal
depending on test class. It would seem a little clearer implementation-wise.
I'm a little hesitant about this code-sharing because the readability decreased IMO. |
How else would you share code without |
@kernc this looks fine. ping on green. |
that was failing due to introduced dispatch to assert_sp_series_equal being too strict.
I personally find the name a little confusing because I only think the A name like |
d = {'a': 0., 'b': 1., 'c': 2.} | ||
result = self.series_klass(d) | ||
expected = self.series_klass(d, index=sorted(d.keys())) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GitHub hid away my comment about this, but I think we should not interlace assert_series_equal
and assert_sp_series_equal
. It reduces modularity, and "assert_series_equal" is confusing for SparseSeries
I find.
I propose that we do the following and define this method:
def assert_series_klass_equal(result, expected):
klass_name = self.series_klass.__name__
if klass_name == "Series":
tm.assert_series_equal(result, expected)
elif klass_name == "SparseSeries":
tm.assert_sp_series_equal(result, expected)
else:
raise ValueError("Invalid 'series_klass' : {name}".format(name=klass_name))
That way you also don't need to modify assert_series_equal
. You can then call this method without worrying what type of Series
you are comparing.
Of course. |
Continued in #17050. |
git diff upstream/master --name-only -- '*.py' | flake8 --diff
(On Windows,git diff upstream/master -u -- "*.py" | flake8 --diff
might work as an alternative.)