-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
directly output LLVM bitcode rather than using LLVM's IRBuilder API #13265
Comments
Two relevant links: |
I'm so exciting about this proposal. One of the future direction can benefit from this is the new GlobalIsel pipeline, which I think is production ready for arm64. By decoupling the llvm ir generation from the zig pipeline, we can target the GMIR directly, and take advantage of the years huge effort by Apple |
Yep, I think we should be fine. Basically what we do for our optimized build is:
So as long as Zig still supports emitting .bc files, we should be fine! 👍 |
I just want to highlight that LLVM bitcode is not stable, so this will add friction for the user. Is the plan to still have integration with clang, but via shelling out rather than linking? Adding that either in the compiler or in build.zig would provide a smoother experience as it would ensure that the user does not need to deal with this versioning stuff. |
https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility you can emit outdated LLVM bitcode and get away with that, it's relatively stable in that way. "The current LLVM version supports loading any bitcode since version 3.0" |
@Snektron you seem to be getting this issue confused with #16270. This issue, when implemented, will mean that Zig outputs .bc files compatible with the same version of LLVM that Zig links against, in memory, and then uses |
Does this mean we still rely on LLVM or an LLVM-compatible backend for machine code generation? |
Yes - this issue isn't related to moving away from LLVM, but simply an implementation detail in terms of how we emit LLVM IR (/bitcode). From the user perspective, this change should have no effect on the compiler's functionality. |
@andrewrk I have a question regarding:
We are also considering using this bitcode approach for LFortran (lfortran/lfortran#2587), the benefits are clear, but I do not understand how can it be faster? One one hand we have the C++ LLVM Builder API that constructs the internal LLVM IR representation in memory. On the other hand we are first creating a binary, then asking LLVM to parse it and then construct the internal LLVM IR representation in memory. If the first approach is implemented in the most efficient way possible, I think it must always be faster, isn't it? Assuming the C++ LLVM Builder API is currently slow, so that it is faster to just create the binary |
Here are some reasons I expect it to be faster this way:
I don't think it's possible for someone to write a faster C++ LLVM Builder API. I think they are limited by C++ and the object-oriented programming paradigm the entire LLVM codebase is built upon. That said, this is all speculation. I could very well be wrong. |
Excellent thanks for the answer. Ok, I can see that there might be a way for it to be faster. It would be great if it is, that would simplify a lot of things. It will probably not be difficult to create a simple benchmark: construct some simple (but long) function or expression using the C++ LLVM Builder API, vs first creating a bitcode file and loading into LLVM. |
That's a great idea! |
@certik looks like we have some performance data to look at in #19031. @antlilja reports 1.16x wall time speedup for this strategy opposed to using LLVM's C++ IR Builder API. Note that this is not the main purpose of the change, but it is a nice little side benefit. Edit: Looks like this is not a fair comparison since master branch is doing some redundant work. I'll follow-up if we have any more accurate measurements. |
Excellent, thanks for the update. That's indeed very encouraging. The bitcode approach is nice and clean and almost no downsides, as long as the performance is comparable. Another idea that I got: in Debug mode compilation we do not turn any optimizations in LLVM, and we want as fast compilation as possible. Unfortunately LLVM compiles very slowly (often 20x slower compared to our direct binary backend). However, we (or you!) could write an alternative code generator that takes the bitcode and generates a binary quickly. We currently use the WASM bitcode for that (we have a fast WASM to binary generator), but the advantage of using LLVM bitcode is that we could reuse the same infrastructure as the Release builds (that use LLVM with optimizations on), thus simplifying maintenance. |
I don't really see the point of taking a detour through LLVM IR when the point is to compile faster. In zig we skip straight to x86 or other machine code. Introducing a pit stop through LLVM IR would certainly be slower than not doing that. |
Zig can be built with or without
-Denable-llvm
. Currently, Zig is not very useful without enabling LLVM extensions. However, as we move into the future, Zig is intending to compete directly with LLVM, making builds of Zig without LLVM a compelling option for the backends directly supported by Zig.There are a few reasons why one might want an LLVM-less binary:
This proposal is to treat LLVM bitcode files (.bc) as the target output format, rather than going through the C++ IRBuilder API. This would make it possible for even non-LLVM-enabled builds of Zig to still output LLVM IR that could be consumed by Clang, other LLVM tools, or integrated with other software.
One example user story comes from Roc. I'd like to get @rtfeldman's take on this - I know that you're using Zig to output .bc files, but then what happens? Does a different tool compile that together with other code, or do you use Zig for the final link step too? I'm guessing that Roc would be able to use the non-LLVM-enabled Zig binaries for their use case.
There is a second major reason for this proposal, which is perhaps even the better argument in favor of it, which is to make incremental compilation work more robustly. As the Zig project moves forward, we want to make
CacheMode.incremental
the default for all backends including LLVM (caddbbc). This means we would want to save the LLVM IR module (.bc) with every compilation and restore it for subsequent compilations, using the IRBuilder API to add and remove declarations as necessary from the LLVM IR module, keeping the .bc file on disk in sync for future incremental compilations.However... the API lacks functionality. For example, aliases cannot be deleted:
zig/src/codegen/llvm.zig
Lines 1330 to 1348 in e67c756
If Zig were in control of outputting the .bc file instead, then Zig could simply not emit aliases that are not supposed to exist. We no longer are limited by what the IRBuilder API can do. This would make the LLVM backend very similar to the WebAssembly backend in the sense that it gains a linking component and directly outputs the module.
Finally, in the incremental compilation sense, Zig would already be trying to keep a .bc file on disk up-to-date via the IRBuilder API. Doing it directly instead of via a limited API is a more direct way to solve the problem, and the performance would be in our hands rather than in the hands of the LLVM project.
I think these two reasons combined make this proposal worth seriously considering, despite the downsides of taking on additional maintenance with LLVM upgrades, and introducing an entirely new class of bugs from generating malformed .bc files.
The text was updated successfully, but these errors were encountered: