Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a better hashing algorithm for classpath hashing #32

Closed
jvican opened this issue Nov 16, 2017 · 2 comments
Closed

Use a better hashing algorithm for classpath hashing #32

jvican opened this issue Nov 16, 2017 · 2 comments
Labels
performance priority / low Any change that has a low priority to be fixed. research

Comments

@jvican
Copy link
Contributor

jvican commented Nov 16, 2017

We're currently using SHA-1. I believe a cryptographic hash is not necessary for hashing and detecting changes in jars, so we should use something faster and more lightweight like xxHash, see my previous attempt to merge this into Zinc here: sbt/zinc#371. We could in theory implement it in bloop since we have the classpath hooks in latest Zinc 1.x for this.

However, this is only useful if we happen to find out that either of the following hypothesis are true:

  1. Hashing more than one classpath entry for multi-module builds happens often; and,
  2. The hash takes up a non-negligible time of the compilation.

I believe that these assumptions are most likely false, so this ticket is only for documenting the possibility of improving the hash. In practice, the cost of hashing the classpath happens only once (the first time you run the compiler) and after that the classpath should be the same, but this claim requires investigation.

I have no idea how bearable this process is for projects with gigantic classpaths that are likely to change (in essence, huge multi-module builds). So maybe it's worth it for them after all.

@jvican jvican added the priority / low Any change that has a low priority to be fixed. label Mar 13, 2018
@jvican
Copy link
Contributor Author

jvican commented May 6, 2018

We need benchmarks to back up the need for this. Some numbers are there, but what we need to answer more concretely is: how is this change going to affect big projects? Is the price high only in batch (and clean) compilation, or does it affect incremental compilation too, and to which extent?

(Note: classpath hashing happens on every incremental compile, so it's no joke. Some of the questions above need more clarification, but the motivation for this change is clear.)

One thing to consider is to associate hashes with classpath entries, and allow any compilation process to reuse those hashes. But if we do that, we need to think about proper invalidation. All options are:

  1. No invalidation: run fast algorithm on every classpath entry.
  2. Invalidation based on filesystem timestamps. We can reuse classpath entries across compilation processes, but we cannot rely 100% on them.
  3. Have background file watchers detecting changes on classpath entries. If there is a change, Bloop will consider that entry as invalidated. This is a novel approach to this problem that we can carry out because Bloop is a compilation server. This would sidestep completely the need for classpath hashing (and the same strategy could be applied to source file hashing -- and make it even incremental).

@jvican
Copy link
Contributor Author

jvican commented Dec 2, 2018

This is fixed in master, we cache with xxHash and in parallel now

@jvican jvican closed this as completed Dec 2, 2018
tpasternak pushed a commit to tpasternak/bloop that referenced this issue Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance priority / low Any change that has a low priority to be fixed. research
Projects
None yet
Development

No branches or pull requests

2 participants