Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Object.hash to hash raw bytes #134

Merged
merged 4 commits into from
Nov 7, 2024
Merged

Use Object.hash to hash raw bytes #134

merged 4 commits into from
Nov 7, 2024

Conversation

jakemac53
Copy link
Contributor

@jakemac53 jakemac53 commented Nov 6, 2024

Avoids decoding values during hashing just to re-encode them as bytes to hash, when stored as raw bytes. Ultimately this mostly just applies to Strings, but that is still impactful. We also avoid a lot of type checks, instead doing switches based on the known or encoded types.

For now I am just using Object.hash and Object.hashAll but we could explore alternatives later on as well, although it would mean passing a byte sink around.

This drops the total time for the large benchmark another 10% ish on my machine.

Copy link
Contributor

@davidmorgan davidmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of high level notes:

  • Needs some test coverage :)
  • Pretty sure Object.hash is not what we want in the end, 32 bit Jenkins hash is pretty small; doesn't need changing now though

@jakemac53
Copy link
Contributor Author

  • Needs some test coverage :)

Huh, I definitely wrote a test at some point previously covering this but I can't find it... 🤔 . I think I must have never landed it lol.

@jakemac53
Copy link
Contributor Author

jakemac53 commented Nov 7, 2024

Aaaand now I am on an adventure fixing growable maps 🤣 (apparently if they have growable maps as values they stack overflow 😄 , or something like that). turns out I was creating cyclic structures on accident

@jakemac53
Copy link
Contributor Author

  • Pretty sure Object.hash is not what we want in the end, 32 bit Jenkins hash is pretty small; doesn't need changing now though

Yeah it is pretty explicitly documented as "collisions are OK, just might make maps slow" 🤣 . So, we very likely do want to do something else here.

@jakemac53 jakemac53 merged commit d1d9555 into main Nov 7, 2024
47 checks passed
@jakemac53 jakemac53 deleted the raw-bytes-hash branch November 7, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants