Use a better hashing algorithm for classpath hashing #32

jvican · 2017-11-16T09:35:23Z

We're currently using SHA-1. I believe a cryptographic hash is not necessary for hashing and detecting changes in jars, so we should use something faster and more lightweight like xxHash, see my previous attempt to merge this into Zinc here: sbt/zinc#371. We could in theory implement it in bloop since we have the classpath hooks in latest Zinc 1.x for this.

However, this is only useful if we happen to find out that either of the following hypothesis are true:

Hashing more than one classpath entry for multi-module builds happens often; and,
The hash takes up a non-negligible time of the compilation.

I believe that these assumptions are most likely false, so this ticket is only for documenting the possibility of improving the hash. In practice, the cost of hashing the classpath happens only once (the first time you run the compiler) and after that the classpath should be the same, but this claim requires investigation.

I have no idea how bearable this process is for projects with gigantic classpaths that are likely to change (in essence, huge multi-module builds). So maybe it's worth it for them after all.

The text was updated successfully, but these errors were encountered:

jvican · 2018-05-06T21:10:59Z

We need benchmarks to back up the need for this. Some numbers are there, but what we need to answer more concretely is: how is this change going to affect big projects? Is the price high only in batch (and clean) compilation, or does it affect incremental compilation too, and to which extent?

(Note: classpath hashing happens on every incremental compile, so it's no joke. Some of the questions above need more clarification, but the motivation for this change is clear.)

One thing to consider is to associate hashes with classpath entries, and allow any compilation process to reuse those hashes. But if we do that, we need to think about proper invalidation. All options are:

No invalidation: run fast algorithm on every classpath entry.
Invalidation based on filesystem timestamps. We can reuse classpath entries across compilation processes, but we cannot rely 100% on them.
Have background file watchers detecting changes on classpath entries. If there is a change, Bloop will consider that entry as invalidated. This is a novel approach to this problem that we can carry out because Bloop is a compilation server. This would sidestep completely the need for classpath hashing (and the same strategy could be applied to source file hashing -- and make it even incremental).

jvican · 2018-12-02T07:01:20Z

This is fixed in master, we cache with xxHash and in parallel now

Update nailgun to 1.0.2

Duhemm added the performance label Nov 21, 2017

jvican added the research label Nov 22, 2017

jvican added the priority / low Any change that has a low priority to be fixed. label Mar 13, 2018

jvican mentioned this issue Aug 10, 2018

Track added products for cache invalidation sbt/zinc#569

Closed

jvican mentioned this issue Aug 27, 2018

Enable classloader caching for macros and plugins in 2.13 ? scala/scala-dev#548

Open

jvican closed this as completed Dec 2, 2018

tpasternak pushed a commit to tpasternak/bloop that referenced this issue Dec 23, 2021

Merge pull request scalacenter#32 from scala-cli/nailgun-1.0.2

df08f38

Update nailgun to 1.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a better hashing algorithm for classpath hashing #32

Use a better hashing algorithm for classpath hashing #32

jvican commented Nov 16, 2017

jvican commented May 6, 2018 •

edited

Loading

jvican commented Dec 2, 2018

Use a better hashing algorithm for classpath hashing #32

Use a better hashing algorithm for classpath hashing #32

Comments

jvican commented Nov 16, 2017

jvican commented May 6, 2018 • edited Loading

jvican commented Dec 2, 2018

jvican commented May 6, 2018 •

edited

Loading