Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hash implementations for immutable objects. #402

Closed
mvanaken opened this issue Nov 8, 2023 · 0 comments · Fixed by #403
Closed

Improve hash implementations for immutable objects. #402

mvanaken opened this issue Nov 8, 2023 · 0 comments · Fixed by #403
Assignees
Milestone

Comments

@mvanaken
Copy link
Contributor

mvanaken commented Nov 8, 2023

Some hash implementations in Metal are expensive and we should improve that to boost performance of Metal.

The general contract (JavaDoc) of hashCode() states:

  1. Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode() must consistently return the same value, provided no information used in equals comparisons on the object is modified. This value doesn’t need to stay consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, calling the hashCode() method on each of the two objects must produce the same value.
  3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables. As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

In addition, see e.g. University of California, Berkeley, a good hash function meets the following criteria:

  1. All objects that are .equals, must have the same hashCode. => equal to contract criteria 2
  2. hashCode values should be spread as evenly as possible over all ints.
  3. hashCode should be relatively quick to compute.
  4. hashCode must be deterministic (not random). => equal to contract criteria 1

Criteria 1 and 7 are equal, criteria 2 and 4 are equal, and criteria 5 is met by using the Object.hash() method as much as possible which helps to spread the hashcodes evenly.

So, to implement good hash codes, we should consider the following simplified criteria:

  • hashCode must be deterministic (not random).
  • All objects that are .equals, must have the same hashCode.
  • It is not required that if two objects are unequal, the hashCode is unequal.
  • hashCode should be relatively quick to compute.

All of the hash function implementations within metal meet these criteria, except for the last. Not all hashCodes are quick to compute.
The DataExpressionSource calculates the hash of the whole parseState it is dependent on and this source is created when using Ties in your tokens.
When Values from a Tie of a large parsegraph are placed within a hashmap, this may cause a significant performance drop.

There are multiple solutions possible:

  • Simplify the hash implementation for ParseState (and possible others), e.g. not taking the whole objects into account, but only the sizes, for the lists and the graph. You might say that if all of the graphs and lists are of the same size and the offset is the same, it is very rare to be in a different state. This may cause hash collisions for two objects that are actually not equal, but since it is not required for two unequal objects to have unequal hashCodes and we are talking about very rare possibilities, this drawback does not weigh the criteria of "quick to compute".
  • Cache the hash for all immutable objects. Since they will never change, it is sufficient to calculate it once.

There is not neccessarily a choice between these solutions. We can actually implement them both.

mvanaken added a commit that referenced this issue Nov 8, 2023
Yes, it is a mutable field for immutable objects, which is why it is
private and I created a separate class for it.
I have considered to calculate the hash within the constructor, but this
dropped performance as well, because then the hashes are always
calculated, even if you do not need it.
mvanaken added a commit that referenced this issue Nov 8, 2023
This tests functionality as well as performance.
@mvanaken mvanaken self-assigned this Nov 8, 2023
mvanaken added a commit that referenced this issue Nov 8, 2023
@mvanaken mvanaken linked a pull request Nov 8, 2023 that will close this issue
mvanaken added a commit that referenced this issue Nov 8, 2023
Since I already suggested to add the equality method to the interface, I
happily oblige Sonar here.
mvanaken added a commit that referenced this issue Nov 8, 2023
mvanaken added a commit that referenced this issue Nov 10, 2023
…est.

This does not stop the test after 500ms, which I expected, but checks if
the runtime does not exceed this number.
mvanaken added a commit that referenced this issue Nov 24, 2023
…java


Minor typo.

Co-authored-by: Carly-1 <75033199+Carly-1@users.noreply.github.com>
mvanaken added a commit that referenced this issue Nov 24, 2023
Caching is an implementation detail. What is important is that it is a
fixed or immutable hash code, hence the rename.
mvanaken added a commit that referenced this issue Nov 24, 2023
…Object.

Fiddled the test numbers to find a scenario where the non-hash-caching
implementation would still finish within a measurable time. Added a
graph showing the different run numbers (on my computer) for comparison.
The TimeOut is set quite large in comparison to the hash-caching
implementation (2 seconds vs 55 ms), but that is to cover possible
GitHub hickups.

Processed review comments of @rdvdijk.
mvanaken added a commit that referenced this issue Nov 24, 2023
mvanaken added a commit that referenced this issue Nov 24, 2023
mvanaken added a commit that referenced this issue Nov 24, 2023
@mvanaken mvanaken added this to the 10.0.0 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant