-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: issues with hash-function for Float64HashTable (GH21866) #21904
BUG: issues with hash-function for Float64HashTable (GH21866) #21904
Conversation
Test failure appears unrelated. Can you push again to re-run it? |
0d7fe27
to
2cec96e
Compare
Codecov Report
@@ Coverage Diff @@
## master #21904 +/- ##
=======================================
Coverage 91.99% 91.99%
=======================================
Files 167 167
Lines 50578 50578
=======================================
Hits 46530 46530
Misses 4048 4048
Continue to review full report at Codecov.
|
2cec96e
to
94b7087
Compare
can you run an asv and report any anomalies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have any user level visible effects?
can you add a whatsnew note (0.24.0), depending on your answer above either API changes or other Enhancements section.
@jbrockmendel No, it was a problem in the one of the added test cases: somewhat naively 16GB memory were reserved (but not committed!), yet obviously different operating systems with different resources reacts differently to such a request. It is a little bit strange, that the testing just died and didn't recover from that... I removed these silly tests and now it looks better (at least it is clear, what goes wrong). |
#define kh_float64_hash_func(key) (khint32_t)((asint64(key))>>33^(asint64(key))^(asint64(key))<<11) | ||
|
||
// correct for all inputs but not -0.0 and NaNs | ||
#define kh_float64_hash_func_0_NAN(key) (khint32_t)((asint64(key))>>33^(asint64(key))^(asint64(key))<<11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a blank between cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Sorry for the silly question: Do you expect me to add a new commit with the improvements to the branch and you will fixup it when merging or should I amend the current commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally push new commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to add a whatsnew in any event
|
||
// correct for all | ||
#define kh_float64_hash_func(key) ((key) != (key) ? \ | ||
kh_float64_hash_func_NAN(Py_NAN) : \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about Py_NAN
:
- Must the case of
Py_NO_NAN
be taken into account? - There is
PANDAS_NAN
, but herePy_NAN
didn't require additional includes.
PS: NAN
from math.h isn't defined for some plattforms.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -235,7 +235,7 @@ Other API Changes | |||
a ``KeyError`` (:issue:`21678`). | |||
- Invalid construction of ``IntervalDtype`` will now always raise a ``TypeError`` rather than a ``ValueError`` if the subdtype is invalid (:issue:`21185`) | |||
- Trying to reindex a ``DataFrame`` with a non unique ``MultiIndex`` now raises a ``ValueError`` instead of an ``Exception`` (:issue:`21770`) | |||
- | |||
- :class:`Float64HashTable` handles zeros/signed zeros and all flavors of NaNs consistently: it is no longer possible to have both, zero and signed-zero, as keys at the same time in a table, also there can be at most one NaN-key in a table (:issue:`21866`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not public, my question below was whether this has a public change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fixes the bug #21866, i.e. produces right results for some esoteric corner cases. This change of behavior can be observed by the end-user, but is this then a public change worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the public case? you can list this, but just make it related to the effect on .unique()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add a test w.r.t. unique
4bf5983
to
0f77145
Compare
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -84,6 +84,7 @@ Other Enhancements | |||
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep`` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`) | |||
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`) | |||
- :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`) | |||
- :class:`Float64HashTable` handles zeros/signed zeros and all flavors of NaNs consistently: it is no longer possible to have both, zero and signed-zero, as keys at the same time in a table, also there can be at most one NaN-key in a table (:issue:`21866`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, can you reword to just focus on .unique()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comments. rebase & ping on green.
pandas/tests/test_algos.py
Outdated
@@ -500,6 +501,23 @@ def test_obj_none_preservation(self): | |||
|
|||
tm.assert_numpy_array_equal(result, expected, strict_nan=True) | |||
|
|||
def test_signed_zero(self): | |||
a = np.array([-0.0, 0.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here as a comment (and on test below)
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -84,6 +84,7 @@ Other Enhancements | |||
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep`` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`) | |||
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`) | |||
- :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`) | |||
- :func:`unique` handles signed zeros consistently: it is no longer possible to have both, 0.0 and -0.0, in the same resulting array (:issue:`21866`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to bug fix / Numeric section.
The following issues 1) hash(0.0) != hash(-0.0) 2) hash(x) != hash(y) for different x,y which are nans are solved by setting: 1) hash(-0.0):=hash(0.0) 2) hash(x):=hash(np.nan) for every x which is nan
11131c9
to
124b095
Compare
@jreback Done. |
@jbrockmendel any comments? if ok pls merge. |
@realead good job catching a subtle bug. Thanks for taking point on this. |
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
it is more or less the clean-up after PR pandas-dev#21904 and PR pandas-dev#22207, the underlying hash-map handles all cases correctly out-of-the box and thus no special handling is needed.
…-dev#21904) * BUG: issues with hash-function for Float64HashTable (GH21866) The following issues 1) hash(0.0) != hash(-0.0) 2) hash(x) != hash(y) for different x,y which are nans are solved by setting: 1) hash(-0.0):=hash(0.0) 2) hash(x):=hash(np.nan) for every x which is nan * add the id of the issue to tests
The following issues
are solved by setting:
git diff upstream/master -u -- "*.py" | flake8 --diff