Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashCode for NumericRange is computed by evaluating all values #12922

Closed
AvaPL opened this issue Dec 12, 2023 · 6 comments
Closed

hashCode for NumericRange is computed by evaluating all values #12922

AvaPL opened this issue Dec 12, 2023 · 6 comments

Comments

@AvaPL
Copy link

AvaPL commented Dec 12, 2023

Reproduction steps

Scala version: 2.13.10, but the problem is also present in other versions

val largeRangeHashCode = (Long.MinValue to Long.MaxValue).hashCode

Problem

java.lang.IllegalArgumentException: More than Int.MaxValue elements.
  at scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:330)
  at scala.collection.immutable.NumericRange$.count(NumericRange.scala:361)
  at scala.collection.immutable.NumericRange.length$lzycompute(NumericRange.scala:75)
  at scala.collection.immutable.NumericRange.length(NumericRange.scala:75)
  at scala.util.hashing.MurmurHash3.indexedSeqHash(MurmurHash3.scala:239)
  at scala.util.hashing.MurmurHash3$.seqHash(MurmurHash3.scala:354)
  at scala.collection.immutable.NumericRange.hashCode$lzycompute(NumericRange.scala:246)
  at scala.collection.immutable.NumericRange.hashCode(NumericRange.scala:246)
  ... 32 elided

The hash code of a numeric range should be calculated properly. It seems that NumericRange uses MurmurHash3.indexedSeqHash which evaluates all values in the range to compute the hash code.

Additionally, even when the range is small enough, the hash code computation is ineffective and may affect the code performance e.g. when a Set[NumericRange[Long]] is used.

@SethTisue
Copy link
Member

@scala/collections

@jxnu-liguobin

This comment was marked as duplicate.

@som-snytt
Copy link

Personally, I see nothing terrible about a library ticket on the dotty side. scala/scala3#19217

Since this repo is not scala/scala, the ticket has no natural home on github.

Also, is it not possible to transfer tickets?

And what would be the consequence of opening scala/scala issues for future maintenance tickets?

The template for scala/scala tickets would have a title of the form ... and nobody noticed until now.

@Ichoran
Copy link

Ichoran commented Dec 13, 2023

I'm not sure this is feasible. Collections who are computing their hashcodes actually check whether they're traversing a range, and if they are, they throw away the sequence-computed hashcode and pick the one that matches Range instead.

But to do that for every type supported by NumericRange is not possible; it might not even be the same depending on what Numerics are in scope.

So there isn't any shortcut that will ensure that NumericRange has a hashcode match when equals does, and because it agrees with sequences with equality (it is part of Scala collections), it has to agree with their hashcodes, too.

You could do a shortcut to Range if and only if the start, end, and step match. That would help some, if it's not already there. And I guess if you can't match a sequence because you're too big, you could compute the hash some other way. But for big, in-range stuff like 1000000000000L to 2000000000000L by 1000L, you're obligated to take all billion and one steps.

And since you are so obligated, it's not totally clear to me that it's worth fixing the other stuff.

@GerretS
Copy link

GerretS commented Dec 13, 2023

Please note this PR on NumericRange, where I attempt to prevent the need of iterating over the whole range for the indexOf call.

scala/scala#10627

What is especially interesting in regard to this issue is my finding that NumericRange allows you to make a range of more than Int.MaxValue elements but almost every method in the class throws that same IllegalArgument exception if you try to call it on a range that large.

A better longterm solution might be to simply accept that this class really doesn't support more than Int.MaxValue elements and evaluate this on instantiation by making length non-lazy.

@SethTisue
Copy link
Member

I'm going to provisionally close this as "not planned", since @Ichoran's argument that we can't shortcut this seems persuasive to me. Happy to reopen if someone can convincingly argue otherwise.

@SethTisue SethTisue closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2024
@SethTisue SethTisue removed this from the Backlog milestone Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants