Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Grand Bootstrapping Plan #853

Closed
andrewrk opened this issue Mar 21, 2018 · 7 comments
Closed

The Grand Bootstrapping Plan #853

andrewrk opened this issue Mar 21, 2018 · 7 comments
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Mar 21, 2018

Depends on:

The idea is to have a single source tarball that, given any C++ compiler which can build for the native machine, can produce a fully operational Zig compiler - for any target. The bootstrapping process is O(1) and never gets more complicated than this, because we continue to maintain the C++ zig implementation enough to the point that it can build the latest self-hosted compiler.

zig-1.0.0-bootstrap.tar.xz

This tarball contains:

  • Zig source code
  • LLVM source code
  • Clang source code
  • LLD source code
  • Whatever libraries the above 3 depend on. This appears to be:
    • zlib source code
  • libc++ source code from LLVM project

The build process:

  1. Use the supplied C++ compiler to build LLVM, LLD, Clang, and their respective required dependencies for the native machine, and then the Zig Stage 1 compiler from C++ source code, for the native machine.
  2. Use Zig Stage 1 to build Zig Stage 2 for the native machine, and then Zig Stage 2 to build Zig Self-Hosted Compiler, for the native machine. We are not done because Zig Self-Hosted Compiler, through the LLVM,Clang,LLD dependencies, depend on native system libraries, for example libc.
  3. Use Zig Self-Hosted Compiler to build zig's libc for the target.
  4. Use Zig's libc and Zig Self-Hosted Compiler - using zig as a C++ compiler - to build libc++ from source for the target. Using the same strategy, and libc++, build LLVM, LLD, Clang, and the libraries they depend on from source, for the target.
  5. Use Zig Self-Hosted Compiler and all these libraries we just cross compiled, to build Zig Self-Hosted Compiler, for the target.

What we're left with after all this is a fully statically linked Zig binary, cross compiled for the target machine, plus all the standard library files and documentation that comes with a release. Bundle this all up into a .tar.xz and we have ourselves a binary ready to distribute to the specified target.

@andrewrk andrewrk added this to the 1.0.0 milestone Mar 21, 2018
@bnoordhuis
Copy link
Contributor

Ambitious, I like it!

libxml2 source code / iconv source code or icuuc source code?

LLVM uses libxml2 to merge Windows manifest files for side-by-side applications and that functionality can be disabled at build time (cmake -DLLVM_LIBXML2_ENABLED=OFF, IIRC.)

I don't expect manifest files are relevant to zig but even if they are, libxml2 can be built without icuuc and iconv support.

The icuuc source is ~90 MB. BSD's libiconv is much smaller but it's still a few MB (big lookup tables.)

we continue to maintain the C++ zig implementation enough to the point that it can bootstrap the latest self-hosted compiler

You don't want to get rid of it over time? Maintaining two compilers seems a bit of a drag: you have to either be conservative with what you use in the stage 2 compiler or implement new language features twice.

Is compiling the stage 2 compiler to C (or, if that's too restrictive, compiling to WebAssembly and using a wasm interpreter) and using that as the stage 1 compiler an option?

@andrewrk
Copy link
Member Author

One of the big reasons for maintaining the c++ compiler is for the benefit of package maintainers such as Debian. They want to be able to bootstrap the compiler from a trusted source version to avoid the back door problem. Maintaining a quick bootstrapping process from C++ code to final binary makes Zig easier to package and therefore more likely to be picked up by various package managers, and more likely to be kept up to date.

Dependencies in c/c++ are always the enemy of people getting the software built and running, so I really want to keep them to a minimum.

Compiling the stage2 compiler to C or WebAssembly does not satisfy the problem, because the C code or WebAssembly code would be output, rather than source code. What we want is a tarball full of source code only, and then with minimal dependencies, be able to convert this to the final output.

@hcnelson99
Copy link
Contributor

What's the plan for ergonomic features? I'm interested in contributing to improve zig's error messages (think https://elm-lang.org/blog/compiler-errors-for-humans). Would these types of improvements be only implemented in the self-hosted compiler or in both?

@thejoshwolfe
Copy link
Contributor

@hcnelson99 The stage1 compile errors are already a little bit human friendly with colors and source printing. We are on par with GCC and Clang for error message formatting including automatically switching modes depending on if stderr is a tty.

There's already been a rejected proposal to add fancy error message features to the stage1 compiler here: #1448 . I would not recommend adding anything fancy to the stage1 compiler in this domain, because its destiny is to only build a single Zig project, so comfy features like you're proposing would probably not be worth the maintenance burden.

I don't think the self hosted compiler is ready for the kinds of fancy features described in the document you linked yet. Maybe there is some work that can be done there, but I don't know.

Was there a specific feature you noticed was missing?

@MarcusJohnson91
Copy link

libstdc++

LLVM does not use libstdc++, that's gcc's C++ standard library.

LLVM uses libcxx.

@andrewrk
Copy link
Member Author

This is actually almost complete: https://github.com/ziglang/bootstrap
Quoting the status:

It gets all the way to successfully building zig0, and the next step is to improve build.zig to support cross compiling instead of assuming native.

@andrewrk
Copy link
Member Author

This is done. It works.

https://github.com/ziglang/bootstrap

The upcoming Zig 0.6.0 release will come with a zig-0.6.0.bootstrap.tar.xz which is just a tarball of the above repository source files. But it's a self-contained tarball (save for those documented system dependencies) that is capable of cross compiling for any target.

So bootstrapping is complete, however the self-hosting effort is still ongoing. At this point it's a function of how much % of code is written in Zig and how much % is in C++. The C++ % will never be 0 because we have to wrap the LLVM, Clang, and LLD C++ APIs with a C API wrapper. Also there's some windows COM API code for detecting MSVC that might as well stay C++.

This should give everyone, especially package maintainers, an idea of the promise of simplicity of bootstrapping Zig from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants