Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre compile zinc compiler interface for common Scala versions #2424

Merged
merged 38 commits into from
Apr 11, 2023

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Apr 6, 2023

While we still the to keep the dynamic codepath necessary to compile stuff at runtime, at least for the common case of existing Scala versions we can pre-compile it, improving performance and removing some of the confusing log messages about compiling stuff that the user didn't write. This only affects the first load, but first impressions matter. Doesn't benefit Scala 3, but there's half the community is still on Scala 2, and Mill's own build.sc is still on Scala 2

Before, we can see that a clean run ends up compiling the compiler interface twice (once to compile build.sc, once to compile the user's ScalaModule) taking ~14s on my laptop

Compiling compiler interface...
[info] compiling 1 Scala source to /Users/lihaoyi/Github/mill/example/basic/1-hello-world/out/mill-build/compile.dest/classes ...
[info] done compiling
[34/48] compile
Compiling compiler interface...
8 warnings
[info] compiling 1 Scala source to /Users/lihaoyi/Github/mill/example/basic/1-hello-world/out/compile.dest/classes ...
[warn] 1 deprecation (since 2.13.0); re-run with -deprecation for details
[warn] one warning found
[info] done compiling
[48/48] run
<h1>hello</h1>
/Users/lihaoyi/Github/mill/target/mill-release -i run --text hello  47.58s user 1.70s system 343% cpu 14.345 total

After, we compile the compiler interface zero times, with a clean run taking ~7s on my laptop

[build.sc] [40/47] compile
[info] compiling 1 Scala source to /Users/lihaoyi/Github/mill/example/basic/1-hello-world/out/mill-build/compile.dest/classes ...
[info] done compiling
[34/48] compile
[info] compiling 1 Scala source to /Users/lihaoyi/Github/mill/example/basic/1-hello-world/out/compile.dest/classes ...
[warn] 1 deprecation (since 2.13.0); re-run with -deprecation for details
[warn] one warning found
[info] done compiling
[48/48] run
<h1>hello</h1>
/Users/lihaoyi/Github/mill/target/mill-release -i run --text hello  19.27s user 0.85s system 297% cpu 6.768 total

Implementation

I added cross modules bridge[_] for all supported Scala versions. These more or less follow what ZincWorkerModule#scalaCompilerBridgeJar already does on-the-fly: download the respective source jar from Maven Central and compile it using the respective Scala version. Note that the META-INF metadata is necessary for Zinc to properly pick these up, and so I manually copy the folder from the unzipped source jar into the resources so it can get propagated to the final jar.

To avoid the slowness of compiling every bridge version during local development, we separate the bridge[_] publishing from the rest of the build, only enabled via the MILL_BUILD_COMPILER_BRIDGES=true environment variable, and published under a separate version. We expect to bump the bridge[_] version rarely, and want to avoid redundantly re-publishing them every Mill release.

Testing

Most of the existing unit tests use the three scala versions that we pre-compile in development mode, though some of them are left on other versions for legacy reasons e.g. they depend on a specific version of Scala to be compatible with a specific version of semanticDB or scala-native, and those continue to use the old download-compile-on-the-fly code path.

I also added a new test case mill.scalalib.HelloWorldTests.compile.nonPreCompiledBridge that together with .fromScratch specifically exercises the code paths for compiled/not-compiled compiler bridges, asserting that the compiler bridge is compiled in the versions it should be compiled for (i.e. those that are not pre-compiled) and not compiled for versions it should not be compiled for (i.e. those that are pre-compiled)

@lihaoyi lihaoyi marked this pull request as ready for review April 6, 2023 13:59
@lihaoyi lihaoyi requested review from lefou and lolgab and removed request for lefou April 6, 2023 13:59
@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 6, 2023

Probably would take a bit more work to wire up the tests for this, but could use a review of the approach if this is something we want to do

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of pre-compiled compiler interfaces. I think, we should create, publish and maintain them in a separate repo with its own release cycle, as these artifacts are more tied to the releases of new Scala versions than to Mill versions. That (new) project could also provides a small library which either knows exactly which compiler versions are available or knows how to test for available versions, so we can decouple that from Mill itself. We could than simply use that library. It might be even useful for other projects.

@lefou
Copy link
Member

lefou commented Apr 7, 2023

Tested by running

./mill -i -6 installLocal
 (rm -rf example/basic/1-hello-world/out && cd example/basic/1-hello-world && time /Users/lihaoyi/Github/mill/target/mill-release -i inspect run)

What is that -6 in the arguments?

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

I like the idea of pre-compiled compiler interfaces. I think, we should create, publish and maintain them in a separate repo with its own release cycle, as these artifacts are more tied to the releases of new Scala versions than to Mill versions. That (new) project could also provides a small library which either knows exactly which compiler versions are available or knows how to test for available versions, so we can decouple that from Mill itself. We could than simply use that library. It might be even useful for other projects.

@lefou the reasons I would like them to be part of the Mill repo:

  1. We need to re-publish Mill every time we want it to be able to make use of a new Scala compiler bridge version anyway, otherwise Mill won't know of the latest version of the external compiler-bridges repo and won't know when compiler-bridges for new versions of Scala become available

  2. Even when published together with the Mill repo, these artifacts do not have any dependencies on other parts of the Mill codebase, so other projects can continue to use the published artifacts anyway without worrying about version conflicts or other such issues

  3. The compiler interfaces are kind of annoying to test on their own. We'd need to set up Zinc to compile some code. This is something we get for free inside Mill, which exercises them as part of the Mill test suite, but we'd need to put in more effort to do so if we wanted them to live standalone repo

Overall, moving them into a separate repo just means we need to update two projects and publish two projects whenever a new Scala version comes out, and does not actually improve re-usability of the artifacts across the broader ecosystem. The cost-benefit would be different if the compiler-bridge artifacts were published and maintained by someone else, but as long as they're published by us, moving them into a separate repo seems to be mostly additional maintenance costs with no benefit.

Also, if someone does decide they want to maintain the compiler bridges separately, moving them into a separate repo is trivial, so we don't need to worry about getting stuck going down this path

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

What is that -6 in the arguments?

Oh that should be -j 6, sorry typo

@@ -20,8 +20,6 @@ trait MillTestKit {

def targetDir: os.Path = defaultTargetDir

def externalOutPath: os.Path = targetDir / "external"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an bug fix in the test harness; previously, external modules like mill.scalalib.ZincWorkerModule were not getting their external module folders deleted between runs, resulting in weird interference between test runs that are meant to be independent. I tweaked the TestEvaluator use the same out/ folder as it uses for normal modules, which is also the setup we use in production. That makes the bug go away and also brings the test and prod configurations closer together

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

CI is green, should be more or less merge ready I think

@lefou
Copy link
Member

lefou commented Apr 11, 2023

I had some setup like mill-moduledefs in mind. I expect, that we don't need to change anything just to add support for a new Scala release. Hence, we don't need to bump the version. We only need to publish the artifacts which belong to the new Scala version but to the same latest released project version.

You can also read all details in this section of the mill-moduledefs Readme: https://github.com/com-lihaoyi/mill-moduledefs#re-publish-for-a-new-scala-release

For tests, we could "just" check out a Mill version/snapshot and run some tests for that other project but with the fresh build artifacts. This setup is probably not for free, but I see the most benefit in it.

We only need to bump the version of the mill-compiler-bridge dependency in Mill, if we change something technical, but for a regular new Scala release, no bumps are needed.

@lefou
Copy link
Member

lefou commented Apr 11, 2023

Why I'm arguing? I think releasing lots of binary artifacts with every Mill release, which don't change their content compared to the previous release, is a disservice. We need to publish more artifacts, we waste storage space and network bandwidth, we need to re-download all pre-compiled jars after each Mill version bump, etc...

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

Even in the setup you described, we would need to bump something in Mill to tell it that a new version is available as a pre-compiled bridge jar. Unless we're willing to do a "try resolving bridge jars regardless of version and catch the failure", which I don't think is a robust approach. So we would need to

  1. bump the bridge jar repo scala version,
  2. publish it,
  3. then bump the "last known precompiled bridge jar" variable somewhere in Mill,
  4. then publish that

This is similar to the instructions you have in the ModuleDefs repo, except those don't include the Mill side changes to bump the Scala version. I'm reluctant to add more manual maintenance burden to Mill, and a bunch of manual steps that need to run every time a new Scala version is released definitely counts as manual maintenance burden. Especially if it ends up multiplied over N separate repos

Let's leave the bridge jars in place for now. If other people want to share ownership, maybe the SBT folks or ScalaCLI folks, we can break them out into a separate repo then and share the maintenance burden. For now, given the release cadence of a handful of times a year, I don't expect the additional bandwidth or storage requirements will pose a problem whether for us publishing or downstream users downloading the artifacts

@lefou
Copy link
Member

lefou commented Apr 11, 2023

This branch published 27 additional jars. Each new published release and published snapshot release will also contain these additional jars, although they don't contain any changes. TBH, I don't find that acceptable. Neither is this responsible usage of Maven Central infrastructure, nor can it be justified to our users. All these binary identical artifacts that end up on each User machine or Nexus or Artifactory, which are going to be virus scanned on Windows and what not. This is not an improvement over compiling them locally once.

If we leave them in this repo, we should maintain a mapping of Scala-Version to Mill-Compiler-Bridge-Version, and avoid re-publishing those that did not change, compared to their previous release. This mapping could be part of some generate scala file in the Zinc worker.

Unless we're willing to do a "try resolving bridge jars regardless of version and catch the failure", which I don't think is a robust approach.

I think this could be a reasonable approach. Alternatively, we could publish some file on our Website, or on the Github release page, which contains actual mappings. If we use pre-compiled compiler bridges on a best-effort basis and fall back to self-compile if that don't work out, I could live with it.

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

I don't think depending on a global mutable file hosted somewhere on the internet is acceptable. That opens up a whole zoo of interesting failure modes, vs our current dependency on a (mostly) readonly Maven Central.

How about we keep the bridges in repo, but with their own version, and only published manually? that would allow us to aboid duplicate publishes, while still keeping things relatively self-contained within the Mill repo, making it easier to e.g. publishLocal for testing than it would be if we had to juggle multiple repos to perform an update

@lefou
Copy link
Member

lefou commented Apr 11, 2023

How about we keep the bridges in repo, but with their own version, and only published manually? that would allow us to aboid duplicate publishes, while still keeping things relatively self-contained within the Mill repo, making it easier to e.g. publishLocal for testing than it would be if we had to juggle multiple repos to perform an update

That looks like a good compromise to me.

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

OK, will update the PR accordingly

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

@lefou i have updated the PR. Hopefully CI will be green once it gets a chance to run

I've put some temporary shims in the github actions to publishLocal the bridge modules before the main tests run, to let CI exercise the code without actually publishing anything to maven central. These shims will be removed before merging, and I'll manually trigger the new github action I added to perform the publishSigned command

Quite happy that it's all in one repo, which is what lets us test this stuff at all in CI, albeit with shims. Otherwise we'd be publishing back and forth on maven central trying to get things working...

If there's anything else I should change let me know, otherwise I'll merge it once I manage to turn CI green agajn

build.sc Outdated
class BridgeModule(val crossScalaVersion: String) extends PublishModule with CrossScalaModule {
def scalaVersion = crossScalaVersion
def publishVersion = bridgeVersion
def artifactName = T{ "mill-compiler-bridge" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about mill-scala-compiler-bridge or mill-scalac-bridge? mill-compiler-bridge doesn't communicate that is tailored to Scala, since Mill can compile many different languages (java, kotlin, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 11, 2023

CI is green, I will proceed with publishing com.lihaoyi:::mill-scala-compiler-bridge:0.0.1, update this PR to remove the local publishing, and then merge this

@lihaoyi lihaoyi merged commit 0855b62 into com-lihaoyi:main Apr 11, 2023
@lefou lefou added this to the 0.11.0-M8 milestone Apr 12, 2023
@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 12, 2023

@lefou @lolgab FYI this has merged, the compiler bridges have been published at version 0.0.1, and the main branch is now building on top of them from maven central https://search.maven.org/search?q=mill-scala-compiler-bridge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants