-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Simpler array hashing #26022
Merged
Merged
RFC: Simpler array hashing #26022
Changes from 1 commit
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
a3c1024
Simplify hashing of AbstractArray
mbauman 5aae6ff
Fixup type instability due to boxes
mbauman f16326f
New design: interleave fibonacci skips with finding non-repeating ele…
mbauman 0d72578
Slow down the Fibonacci sequence
mbauman 0e97f65
Add tests
mbauman 42c8ea9
Use rem instead of mod
mbauman 3d62fba
Also add test for #26034
mbauman 5a99f4b
Fixup test case for 32-bit machines
mbauman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IIUC this finds the first entry after
keyidx
which differs from the last hashed element. That's a bit different from what I had in mind: I was thinking about finding the first element which differs from the one atkeyidx
. I'm not sure it's really better, but the idea was that since in a sparse array it's likely thatkeyidx
hits a structural zero, looking for the previous distinct element makes it likely you'll hash a non-zero entry. With your approach, if you hash a zero the first time, you will look for the previous non-zero entry the next time, but if you hit a zero entry the time after that, you'll happily hash it; so you'll end up hashing a zero half of the time, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly right. I think it makes the behavior a little more robust — otherwise the hashes of sparse arrays with a nonzero last element will more likely hash the same. I also think it's most likely that diagonals of sparse matrices are filled.
That said, this is now clearly not hashing enough elements. The test failures are from four-element arrays colliding. Gotta slow down the exponential a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure I follow. Could you develop?
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, why use
findprev
, which requires all this keys vs. linear indices dance, instead of a plain loop? I imagine one reason could be thatfindprev
could have a specialized method for sparse arrays which would skip empty ranges, but currently that's not the case. Is that what you have in mind?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood your comment at first — I thought you were suggesting only unique'ing against the very first element. Now I understand that you mean to access the value after each skip, and then hash the next element that's different from it.
There are two reasons to do the keys vs. linear indices dance: one is
findprev
, but the other is that I want to hash index-value pairs to add a bit more information about the structure of the array. And of course, we cannot introduce a hashing difference between IndexLinear and IndexCartesian arrays. The fact that we then also allow arrays to opt into optimizations viafindprev
is a nice bonus, especially since it's going to be a pain to completely re-write your ownhash
optimization that has exactly the same behavior.