NaN when hashed should have a random hash value #4510

dgryski · 2013-01-16T20:06:57Z

20:55:17 <dgryski> I saw the discussion about NaN's in the weekly meeting minutes.  Did anybody bring up Go's behaviour?  NaN's get a random hash value: http://research.swtch.com/randhash
20:56:07  <bstrie> dgryski: interesting
20:59:55  <bstrie> dgryski: would you mind filing an issue to add this to the table(s) in the stdlib? I don't know if if solves the general issue that was being  discussed, but it sounds neat regardless

http://research.swtch.com/randhash for more discussion.

The text was updated successfully, but these errors were encountered:

graydon · 2013-01-24T04:52:36Z

I believe this is exactly addressing the problem that came up. Thanks for pointing it out! I'm happy to follow Russ' lead here.

nikomatsakis · 2013-01-25T03:59:17Z

I... am not sure about this! At least let's discuss it. I am trying to remember what precisely we concluded when discussing this after a recent meeting. @pcwalton ?

nikomatsakis · 2013-03-10T17:34:46Z

I am going to close this, since I believe the problem is adequately and better addressed by #5283.

graydon · 2013-03-11T01:45:56Z

No float-keyed hashtables?

thestinger · 2013-03-11T02:06:55Z

@graydon: I think there are ways to work around it, like implementing TotalOrd for floats but failing (or using a condition?) if a NaN shows up.

graydon · 2013-03-11T04:25:53Z

The underlying issue isn't the equality, it's the hashing itself. It's worth reading the linked article in detail if you haven't yet -- the behavior it describes is intended to maintain the operational invariants of a map.

thestinger · 2013-03-11T04:58:30Z

@graydon: The hashing isn't the direct cause of the issue though, NaN != NaN so every one inserted into the hash table will be a new entry. Randomizing the hash maintains the O(1) expected search time, but either preventing the insertion of NaN or not using the IEEE754 equality rules (and saying NaN == NaN) would also solve it (and you wouldn't end up with keys you can't retrieve).

nikomatsakis · 2013-03-11T16:04:50Z

@graydon TotalOrd would have a defined ordering and equality for NaN (just as e.g. Java does; check out the rules for Double.equals(), Double.compareTo() and Double.hashCode()). So a float-keyed hashtable would work just fine.

nikomatsakis · 2013-03-11T16:07:44Z

Note that there is no need to fail dynamically when a NaN is encountered. The behavior of TotalOrd (or just Ord, as pcwalton suggested) is just different from the behavior of partial-comparisons (i.e., the < and == operators) when it comes to NaN. This is what Java and Racket do and it seems like a reasonable compromise. Certainly better than randomly hashing NaN---this just leads to filling up your hashtable rather than actual sensible behavior.

nikomatsakis · 2013-03-11T16:09:31Z

(Not that the situations in Java and Racket are directly analogous either. Well, Java basically is. Racket somewhat less so, and I don't claim a deep understand of their solution, but @jbclements and I did some experimentation and found that there is some special treatment around NaN to deal with these inconsistencies.)

graydon · 2013-03-12T00:59:09Z

Ok, so after some discussion I'm reasonably OK with closing this, but I'll make notes here:

IEEE 754 has two kinds of equality: partial and total. The totalOrder predicate is defined to match the partial order but extend it to cover a few cases (paraphrasing doc -- it's not a free spec):
- totalOrder(-0, +0) == true and totalOrder(+0, -0) == false
- totalOrder(-NaN, y) == true and totalOrder(x, +Nan) == true
- signaling NaN sorts below quiet, negative below positive, lesser payload below greater
- other exponent comparison cases to handle multiple-representation possibilities, hopefully only relevant in decimal floating point
totalOrder is either slow or simply unexpected to users much of the time when dealing with floats
- In particular, we would like <, <=, ==, !=, >= and > to use the partial order (for speed alone?)
for containers, a total order is required for the data structure API to work reasonably (eg. inserting an element lets you retrieve it, inserting it twice overwrites, rebalancing operations don't lose nodes, etc.)
other languages have ad-hoc tricks to "totalize" their comparators when inserting into hashtables and trees. Java overrides the boxed Double type (but not the primitive double), haskell defines a compare operator that behaves differently on NaN than the relational operators, but is used for containers, etc.
In our case we're going to do the following:
- define a nice TotalOrd trait
- have it raise a condition when faced with partially-ordered values such as NaN: float::partial_order::cond.raise_default((a,b) || float::ieee754r::totalOrder(a,b))
- a user who wants something other than the 754 totalOrder predicate can install one (something faster? unlikely going via conditions; more likely they'll just want to fail)
- otherwise they get that predicate, which might be slow but is at least total

This matches the precedent of other languages and I think the intent of the IEEE committee in adding totalOrder. It does not involve random hashing; the hash value arrived-at by "just hashing the byte-representation" should respect the same equality that totalOrder respects; at least I think that's what the intent of totalOrder is (to sort representations more-or-less by their bit patterns). So I think we can just hash the bytes.

nikomatsakis · 2013-03-12T01:10:08Z

Just to add something I wrote on IRC: my one concern with using conditions is that the method used to hash/compare NaN should really be associated with the data structure, but conditions are dynamically scoped, so making things customizable via conditions opens the door to data structure incoherence.

thestinger mentioned this issue Mar 8, 2013

std::hashmap should use TotalEq #5283

Closed

nikomatsakis closed this as completed Mar 10, 2013

graydon reopened this Mar 11, 2013

graydon closed this as completed Mar 12, 2013

dredozubov mentioned this issue Nov 23, 2013

enforce HashMap to use TotalEq #10619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN when hashed should have a random hash value #4510

NaN when hashed should have a random hash value #4510

dgryski commented Jan 16, 2013

graydon commented Jan 24, 2013

nikomatsakis commented Jan 25, 2013

nikomatsakis commented Mar 10, 2013

graydon commented Mar 11, 2013

thestinger commented Mar 11, 2013

graydon commented Mar 11, 2013

thestinger commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

graydon commented Mar 12, 2013

nikomatsakis commented Mar 12, 2013

NaN when hashed should have a random hash value #4510

NaN when hashed should have a random hash value #4510

Comments

dgryski commented Jan 16, 2013

graydon commented Jan 24, 2013

nikomatsakis commented Jan 25, 2013

nikomatsakis commented Mar 10, 2013

graydon commented Mar 11, 2013

thestinger commented Mar 11, 2013

graydon commented Mar 11, 2013

thestinger commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

nikomatsakis commented Mar 11, 2013

graydon commented Mar 12, 2013

nikomatsakis commented Mar 12, 2013