Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use classpath PathRefs hashCode as cache key for Zinc worker #2185

Merged
merged 3 commits into from
Dec 12, 2022

Conversation

lolgab
Copy link
Member

@lolgab lolgab commented Dec 9, 2022

The previous implmentation was receiving os.Paths so it needed to access the filesystem to know if the files did change.
Now we propagate PathRefs so we simply use their hashCode which changes if the files did change.
Here some numbers collected by compiling a Scala file 176 times and writing down the time calculating compilersSig took.

Before After
Average 1364 ms 12 ms
Std deviation 6533.41 91.35

Since the compiler jars have the Scala version in the name, it is enough
The `compilersSig` takes now ~20ms vs ~200ms to build  in my machine
Since it runs on every recompilation, it makes the compiler noticeably
faster
@lolgab lolgab marked this pull request as ready for review December 10, 2022 12:17
@lolgab lolgab requested a review from lefou December 10, 2022 12:17
@lefou
Copy link
Member

lefou commented Dec 10, 2022

Since the compiler jars have the Scala version in the name, it is enough The compilersSig takes now ~20ms vs ~200ms to build in my machine Since it runs on every recompilation, it makes the compiler noticeably faster

First, my feeling about this is not good. The whole correctness is based on assumptions we don't check or enforce. You need a counter example? Yeah, probably seldom, but you could use a locally built compiler jar, which might be always in the same place and lack any indicator of change in it's name. Whenever you rebuild it, you probably also should update the bridge. Or you download or build a custom compile in another Mill target and return it as T.dest / "out.jar". The path stays stable whereas the content may not.

As a consequence, risking the whole correctness on an weak assumption for "only" some milliseconds isn't worth it.

But, as we already on this issue, I have another idea to offer. We already know much more about those compiler classpath that the file names, the PathRef. The PathRef is exactly built for such kind of situation: detect changes. Calculating the PathRef of the compiler classpath is expensive (if not used with quick), so we don't want to produce it here. But if we look closer, it was already calculated further up in the call stack, but later dropped. If we change the worker API to accept a PathRef of the compiler classpath, we have a perfect cache key for free. I'm pretty sure that we already work with PathRefs in all places, where we use the compiler API, but we drop them later to only extract the path and pass that to the current API.

val compilersSig =
compilerBridgeSig +
combinedCompilerClasspath.map(p => p.toString().hashCode + os.mtime(p)).sum
val compilersSig = combinedCompilerClasspath.map(p => p.hashCode).sum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will miss changes to the files.

@lolgab
Copy link
Member Author

lolgab commented Dec 10, 2022

I implemented the approach you suggested and now we also perform much fewer conversions.

)(f: Compilers => T)(implicit ctx: ZincWorkerApi.Ctx) = {
val combinedCompilerClasspath = compilerClasspath ++ scalacPluginClasspath
val combinedCompilerJars = combinedCompilerClasspath.iterator.toArray.map(_.toIO)
val compilersSig = combinedCompilerClasspath.hashCode + scalaVersion.hashCode + scalaOrganization.hashCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there some issue you fixed by adding the scala version and organization, or is this just for completeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code had compilerBridgeSig which was calculated also with:

val compilerBridgeSig = os.mtime(compiledCompilerBridge)

Since compiledCompilerBridge is calculated using scalaVersion, scalaOrganization and compilerClasspath I took the hashCode of the input, so I added scalaVersion and scalaOrganization to the result.
compilerClasspath is already taken into the sig with the hashCode of combinedCompilerClasspath

@lolgab lolgab requested a review from lefou December 11, 2022 10:23
Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you!

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, can you please update the PR title and description? Also, are the numbers for the speed improvements roughly correct, now that you changed the implementation a bit?

@lolgab lolgab changed the title Use classpath paths only to cache zinc workers Use classpath PathRefs hashCode as cache key for Zinc worker Dec 12, 2022
@lolgab
Copy link
Member Author

lolgab commented Dec 12, 2022

Updated the description and collected numbers in a better way :)

@lefou lefou merged commit 69150ac into com-lihaoyi:main Dec 12, 2022
@lefou lefou added this to the 0.11.0-M1 milestone Dec 12, 2022
@lolgab lolgab deleted the faster-zinc-worker branch December 12, 2022 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants