Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Comparison between two PeriodIndexes doesn't validate length (GH #23078) #23896

Closed
wants to merge 6 commits into from

Conversation

makbigc
Copy link
Contributor

@makbigc makbigc commented Nov 25, 2018

@pep8speaks
Copy link

pep8speaks commented Nov 25, 2018

Hello @makbigc! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 25, 2018 at 11:36 Hours UTC

@codecov
Copy link

codecov bot commented Nov 25, 2018

Codecov Report

Merging #23896 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23896      +/-   ##
==========================================
+ Coverage    92.3%   92.31%   +<.01%     
==========================================
  Files         163      163              
  Lines       51950    51954       +4     
==========================================
+ Hits        47953    47961       +8     
+ Misses       3997     3993       -4
Flag Coverage Δ
#multiple 90.72% <100%> (ø) ⬆️
#single 42.99% <25%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 97.59% <100%> (+0.04%) ⬆️
pandas/core/arrays/period.py 98.48% <100%> (ø) ⬆️
pandas/util/testing.py 87.84% <0%> (+0.09%) ⬆️
pandas/core/arrays/datetimelike.py 96.57% <0%> (+0.21%) ⬆️
pandas/core/arrays/integer.py 96.23% <0%> (+0.68%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 159772d...ac94453. Read the comment docs.

@makbigc makbigc changed the title Bug fix (GH #23078) [Bug] Comparison between two PeriodIndexes doesn't validate length (GH #23078) Nov 25, 2018
@TomAugspurger TomAugspurger added the Period Period data type label Nov 25, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Nov 25, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this a general check in the base class instead

@@ -73,6 +73,9 @@ def wrapper(self, other):
msg = DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)

if other.ndim > 0 and len(self) != len(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should fail if ndim of other != 1

we don't generally check this for other arrays (though we should). So I would maybe add this check on the base array class and just call it here, _validate_equal(self, other) or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By base class, do you mean ExtensionOpsMixin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would call it something like _validate_shape or _check_shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think we have an implicit guarantee on 1-D ness now (or at the very least the pandas extension types do).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related #23853

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 26, 2018

@makbigc when you update for @jreback's feedback, can you add a couple new base tests to BaseComparisonOpsTests and BaseArithmeticOpsTests ensure that a ValueError is raised when comparing / operating on two EAs with different lengths? These are in tests/extension/base/ops.py.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 26, 2018
@jreback jreback mentioned this pull request Nov 26, 2018
4 tasks
@jreback jreback removed this from the 0.24.0 milestone Nov 29, 2018
@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

@makbigc can you merge master and update

@makbigc makbigc force-pushed the bug-fix-23078 branch 2 times, most recently from 9b0b723 to 47bf1bf Compare December 23, 2018 07:58
@jreback
Copy link
Contributor

jreback commented Dec 23, 2018

this overlaps with #24397 @jbrockmendel can you comment here

@@ -299,6 +299,12 @@ def _check_divmod_op(self, s, op, other, exc=NotImplementedError):
def test_error(self):
pass

def test_arith_diff_lengths(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these tests passed because they would fail? if so, mark with @pytest.mark.xfail.

@@ -119,6 +119,12 @@ def test_direct_arith_with_series_returns_not_implemented(self, data):
"{} does not implement add".format(data.__class__.__name__)
)

def test_arith_diff_lengths(self, data, all_arithmetic_operators):
op = self.get_op_from_name(all_arithmetic_operators)
other = data[:3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test actually wouldn't catch the behavior in #23078, which is caused specifically by comparisons of PeriodIndex with length-1 objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue in #23078 was treated in the function test_comp_op (pandas/tests/indexes/period/test_period.py).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding tests to each of the EA subclasses? the base test should work for each of these.

@@ -1319,6 +1319,7 @@ Datetimelike
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would incorrectly lose the index's ``freq`` attribute (:issue:`21282`)
- Clarified error message produced when passing an incorrect ``freq`` argument to :class:`DatetimeIndex` with ``NaT`` as the first entry in the passed data (:issue:`11587`)
- Bug in :func:`to_datetime` where ``box`` and ``utc`` arguments were ignored when passing a :class:`DataFrame` or ``dict`` of unit mappings (:issue:`23760`)
- Bug in :class:`PeriodIndex` when comparing indexes of different lengths, ValueError is not raised (:issue:`23078`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks around ValueError

def test_arith_diff_lengths(self, data, all_arithmetic_operators):
op = self.get_op_from_name(all_arithmetic_operators)
other = data[:3]
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you match the message as well

def test_compare_diff_lengths(self, data, all_compare_operators):
op = self.get_op_from_name(all_compare_operators)
other = data[:3]
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match the message

@@ -324,6 +331,12 @@ def test_compare_array(self, data, all_compare_operators):
for i in alter]
self._compare_other(s, data, op_name, other)

# TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these xfail? make the change as needed in the decimal code

@@ -291,9 +291,13 @@ def _check_divmod_op(self, s, op, other, exc=NotImplementedError):
s, op, other, exc=TypeError
)

def test_arith_diff_lengths(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these added?

@makbigc
Copy link
Contributor Author

makbigc commented Dec 26, 2018

Should I close this PR? Because #23078 was already resolved by #24397. The length check is done in the wrapper of _period_array_cmp. This is not need to add _validate_shape additionally.

The generic base test cannot handle all six EA subclasses:

  • Integer array: deprecationwarning raised by numpy
  • category: only eq and ne comparison are implemented
  • json: no json comparison

The raise of ValueError for DecimalArray could be implemented in a new PR.
@jreback @TomAugspurger Thank you for your review.

@makbigc makbigc closed this Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

comparison between two PeriodIndexes doesn't validate length
6 participants