Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing integer ranges #12226

Closed
jakebolewski opened this issue Jul 20, 2015 · 8 comments
Closed

Hashing integer ranges #12226

jakebolewski opened this issue Jul 20, 2015 · 8 comments

Comments

@jakebolewski
Copy link
Member

Please close if I'm missing something obvious (it is early) but I thought that this was supposed to work.

julia> @which [1:10...] == 1:10
==(A::AbstractArray{T,N}, B::AbstractArray{T,N}) at abstractarray.jl:1013

julia> [1:10...] == 1:10
false

julia> hash([1:10...]), hash(1:10)
(0xf58b1de9b53af894,0x21eb1cd76b7f5f20)
@yuyichao
Copy link
Contributor

From the definition of the == this seems to be expected?

@mbauman
Copy link
Member

mbauman commented Jul 20, 2015

This was changed in #6084 as a consequence of the discussion in #5778 (comment).

The trouble with having arrays and ranges equal is that they must hash equal, and computing the hash of a range like an array requires hashing every single element. If they compare unequal, then ranges can hash in O(1).

@jakebolewski
Copy link
Member Author

Thanks.

@jakebolewski
Copy link
Member Author

So I guess isequal is not part of the AbstractArray interface now?

@mbauman
Copy link
Member

mbauman commented Jul 20, 2015

It is unfortunate. :-\

We use run-length encoding when hashing the elements so sparse matrices don't have the same issue. It might not be that much more work to check if the elements can be lifted to a StepRange, too, but it's totally non-composable and would take much more time to check for a lifted FloatRange or LinSpace. I'm not sure there's a winning choice here since equality and hashing are so fundamental and need to be fast, but many custom arrays don't store their elements directly and instead compute them on the fly.

@StefanKarpinski
Copy link
Member

I wonder if we could do some kind of clever mathematical trick here, but nothing comes to mind. Maybe something along the lines of hashing the run-length encoding of the diff of a vector? That would be fast for integer ranges, but not necessarily for floating-point ranges since the intervals are often irregular.

@mbauman
Copy link
Member

mbauman commented Jul 20, 2015

I don't think we need to bend over backwards to support float ranges. They're already approximations of a theoretical range, and checking float equality is fraught with trouble in any case. It'd be interesting to experiment with RLE of the diff + first element. That would be a net gain for custom types, too: linear and piece-wise linear arrays could add fast paths (in addition to constant and piece-wise constant).

@simonster
Copy link
Member

One challenge for the diff RLE thing is overflow. How do we make hash([typemax(Int), typemin(Int)]) == hash(Int128[typemax(Int), typemin(Int)]) given that typemin(Int) - typemax(Int) == 1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants