Hashing integer ranges #12226

jakebolewski · 2015-07-20T14:34:56Z

Please close if I'm missing something obvious (it is early) but I thought that this was supposed to work.

julia> @which [1:10...] == 1:10
==(A::AbstractArray{T,N}, B::AbstractArray{T,N}) at abstractarray.jl:1013

julia> [1:10...] == 1:10
false

julia> hash([1:10...]), hash(1:10)
(0xf58b1de9b53af894,0x21eb1cd76b7f5f20)

yuyichao · 2015-07-20T14:40:41Z

From the definition of the == this seems to be expected?

mbauman · 2015-07-20T14:43:47Z

This was changed in #6084 as a consequence of the discussion in #5778 (comment).

The trouble with having arrays and ranges equal is that they must hash equal, and computing the hash of a range like an array requires hashing every single element. If they compare unequal, then ranges can hash in O(1).

jakebolewski · 2015-07-20T14:52:00Z

Thanks.

jakebolewski · 2015-07-20T15:12:50Z

So I guess isequal is not part of the AbstractArray interface now?

mbauman · 2015-07-20T16:18:03Z

It is unfortunate. :-\

We use run-length encoding when hashing the elements so sparse matrices don't have the same issue. It might not be that much more work to check if the elements can be lifted to a StepRange, too, but it's totally non-composable and would take much more time to check for a lifted FloatRange or LinSpace. I'm not sure there's a winning choice here since equality and hashing are so fundamental and need to be fast, but many custom arrays don't store their elements directly and instead compute them on the fly.

StefanKarpinski · 2015-07-20T17:04:11Z

I wonder if we could do some kind of clever mathematical trick here, but nothing comes to mind. Maybe something along the lines of hashing the run-length encoding of the diff of a vector? That would be fast for integer ranges, but not necessarily for floating-point ranges since the intervals are often irregular.

mbauman · 2015-07-20T17:17:09Z

I don't think we need to bend over backwards to support float ranges. They're already approximations of a theoretical range, and checking float equality is fraught with trouble in any case. It'd be interesting to experiment with RLE of the diff + first element. That would be a net gain for custom types, too: linear and piece-wise linear arrays could add fast paths (in addition to constant and piece-wise constant).

simonster · 2016-05-14T22:53:10Z

One challenge for the diff RLE thing is overflow. How do we make hash([typemax(Int), typemin(Int)]) == hash(Int128[typemax(Int), typemin(Int)]) given that typemin(Int) - typemax(Int) == 1?

jakebolewski closed this as completed Jul 20, 2015

mbauman mentioned this issue Sep 18, 2015

Arraypocalypse Now and Then JuliaLang/LinearAlgebra.jl#255

Closed

27 tasks

simonster mentioned this issue Oct 12, 2015

1:3 != collect(1:3) #13565

Closed

nalimilan mentioned this issue May 14, 2016

Get rid of special-casing of ranges in == and isequal() for AbstractArrays #16364

Closed

nalimilan mentioned this issue May 17, 2016

Make arrays and ranges hash and compare equal #16401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashing integer ranges #12226

Hashing integer ranges #12226

jakebolewski commented Jul 20, 2015

yuyichao commented Jul 20, 2015

mbauman commented Jul 20, 2015

jakebolewski commented Jul 20, 2015

jakebolewski commented Jul 20, 2015

mbauman commented Jul 20, 2015

StefanKarpinski commented Jul 20, 2015

mbauman commented Jul 20, 2015

simonster commented May 14, 2016

Hashing integer ranges #12226

Hashing integer ranges #12226

Comments

jakebolewski commented Jul 20, 2015

yuyichao commented Jul 20, 2015

mbauman commented Jul 20, 2015

jakebolewski commented Jul 20, 2015

jakebolewski commented Jul 20, 2015

mbauman commented Jul 20, 2015

StefanKarpinski commented Jul 20, 2015

mbauman commented Jul 20, 2015

simonster commented May 14, 2016