-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track added products for cache invalidation #569
Conversation
The validator has checked the following projects against Scala 2.12,
❌ The result is: FAILED |
@eatkins Hey Ethan, resident compilation has not been working for more than three years (it started as an experiment and it was then dropped). Since then, Zinc 1.0 hasn't been tweaked to work with resident compilation and I would prefer not to tweak it until there's a working and reliable resident compilation working in the |
@jvican that's interesting. I have actually been using it for the last 6 months without much in the way of noticeable issues. What I have noticed is that it often takes a few hundred milliseconds to create a new compiler instance so using a cached compiler is a noticeable improvement when it works. Do you remember off hand what doesn't work when using the cached compiler? Also, if cached compilers are unsupported, why are they not deprecated both in zinc and in sbt? |
I discovered that in sbt, the scripted tests/internal-tracking test would fail if run with -Dsbt.resident.limit=$NUM where $NUM > 0. This was because the test introduced a dependent project b, that depended on a, but was set to not track a. If the b project was compiled before the a project, the compilation would fail, but a cache entry would be added to the compiler cache with a key that included a's target directory in its classpath. If a was then independently compiled, the expectation would be that b would also compile, but this actually failed. The reason was that when zinc looked for the cached compiler, it found a hit and did not create a new compiler instance. This was a problem because the previous instance was unaware of the newly generated classes from the a project. To get b to compile, it is necessary to invalidate the compiler cache entry. To invalidate the entry, we must detect that there are changes to the project's dependencies. Previously, zinc did not even look at new classes in a dependencies target directory. It did look at removedProducts, but did not actually consider the removed products when constructing the DependencyChanges instance in Incremental.scala. To resolve this, I just append the removed and new products to the binary dependency changes. After this change, `scripted project/internal-tracking` passes even when a global compiler cache is used. The test I added to MultiProjectIncrementalSpec also fails without this change and passes with it.
For context, I have been trying to reduce the time between triggering a build and the start of compilation in sbt. When you optimize the other initialization steps (gathering source files and comparing stamps), creating a new compiler instance ends up taking up most of the time. I have been able to get continuous builds using |
The initialization time of the compiler is negligible when the JVM is hot. I was not the guy doing experiments with resident compilation but from what I know, it was deemed not worth it compared to the speedups it gave. Down the road, I'd love to experiment with that but until then I would prefer to keep things as they are in Zinc.
I agree with you we can do much better in this regard. However, I'd like to give you some data points that I've gotten from using Bloop, which I encourage you to try if you care about compilation performance (we've seen from 30% to 2x faster compilation times for big builds). Below there is graph that measures the compilation of a small module defining a few tests with both sbt and bloop. Bloop compiles the files in 62 ms, 3x faster than sbt: It largely depends on your project, but this benchmark illustrates well that bloop does much better at compiling incrementally than sbt (incremental compilation only compiles a few source files at once in the most common case).
Now coming back to resident compilation. In order to do resident compilation correctly the compiler needs to be able to invalidate the symbol table whenever one of the classpath entries is modified. From what I know, no Scala compiler does that yet (although I've been doing work in this area that is bringing us closer to it) and therefore using resident compilation is not safe and it's discouraged. It would be great, though, if you can give me numbers of using resident compilation with and without bloop so that we can see how much speedup it gives us compared to the state-of-the-art. This will give us a lot of insight and who knows, maybe it encourages us to prioritize it in the close future. |
Thanks for the clarification! I have been working on integrating an optional file system cache into sbt that automatically updates itself and recalculates hashes when it detects source file changes. This is how I've been able to reduce the compilation initialization time in sbt. It will be interesting to see if my changes can improve the performance relative to bloop. If not, I think I understand the pipeline well enough to hopefully be able to integrate the bloop performance improvements into sbt. As an experiment, I also integrated the file system cache into mill and did some rudimentary performance tests and I actually found sbt to be slightly faster than mill for large projects, but this was a few months ago and I'm not sure if that result will hold up. |
We have something similar to this in Bloop, and there's still work to do to share all of these hashes along all of its users inside the bloop codebase. I registered this idea in this thread:
So at some point I'll get back to it if nobody beats me to it. |
Cool. It's good to see that my approach is inline with what you and others have considered for bloop. One pie in the sky idea is to have the file system cache actually include the contents of the source files. If the compiler exposed an interface that took a list of of source files along with the contents of the source files, then the compiler would not need to do any disk io to compile the sources. I am not really sure how disk bound compilation is, so maybe it wouldn't matter. I was initially surprised to discover the extent to which sbt can be heavily disk io bound though. |
In the large scheme of things, compilation is not disk intensive. However, to gain some speedups from IO I've recommended several Bloop users to just use tmpfs directories as target directories and to rebase all the classpath entries of all the projects in a tmpfs directory. Tmpfs directories give you more throughput and less latency than reading/writing from/to disk, so I imagine that this alone would give you performance similar to the one you can get from swovel (but obviously I haven't tested this). Note that my approach in bloop does not aim to store the contents of the source files, but rather just cache the "meta" information that I'm interested about and expose it to the different parts of the compiler APIs that need it. This kind of meta information is things like hashes (real hashes, not timestamps). |
@jvican I remain a bit confused about your statement that resident compilation doesn't work in scalac. I have been using it in all of my projects for a year and have not had any issues with it. The only issue I've ever encountered with it was when I set resident compilation to be the default in sbt and ran the sbt scripted tests. Even then, there was only one test that failed and it was doing some weird stuff with the project tracking configuration and this PR fixes that test when run with resident compilation enabled. |
The resident compiler was removed (#835), so I guess this can be closed too. |
I discovered that in sbt, the scripted tests/internal-tracking test
would fail if run with -Dsbt.resident.limit=$NUM where $NUM > 0. This
was because the test introduced a dependent project b, that depended on
a, but was set to not track a. If the b project was compiled before the
a project, the compilation would fail, but a cache entry would be added
to the compiler cache with a key that included a's target directory in
its classpath. If a was then independently compiled, the expectation
would be that b would also compile, but this actually failed. The reason
was that when zinc looked for the cached compiler, it found a hit and
did not create a new compiler instance. This was a problem because the
previous instance was unaware of the newly generated classes from the a
project. To get b to compile, it is necessary to invalidate the compiler
cache entry.
To invalidate the entry, we must detect that there are changes to the
project's dependencies. Previously, zinc did not even look at new
classes in a dependencies target directory. It did look at
removedProducts, but did not actually consider the removed products when
constructing the DependencyChanges instance in Incremental.scala. This
commit adds an addedProducts* field to InitialChanges with alternate
constructors and apply methods for binary compatibility. I then update
Incremental.scala to append both the removed and added products to the
DependencyChanges.modifiedBinaries parameter. After those two changes,
the scripted test passes with or without using cached compilers. Note
that this behavior was never detected in the past because all of the
scripted tests are run without setting sbt.resident.limit so every
single run of compile is guaranteed to have a fresh compiler.
products to the removedProducts set, but that seems liable to lead to
confusion.