Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the feature of O(1) bootstrapping to temporarily regress #6378

Closed
andrewrk opened this issue Sep 18, 2020 · 7 comments · Fixed by #13560
Closed

Allow the feature of O(1) bootstrapping to temporarily regress #6378

andrewrk opened this issue Sep 18, 2020 · 7 comments · Fixed by #13560
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. stage1 The process of building from source via WebAssembly and the C backend.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Sep 18, 2020

Related: #853

Right now the bootstrap repository can start with cmake, a C++ compiler, bash, and cross compile zig for all supported targets. It does this in a fixed number of steps, because the Zig compiler is currently in c++ code and depends on an LLVM backend.

This proposal is to allow this feature to regress for a period of time between now and 1.0, restoring the feature before 1.0.

What that regression would look like is:

  • Finish implementing self-hosted (stage2) and pass all the behavior tests, std lib tests, and able to build itself.
  • Finish the C backend.
  • Use the C backend to generate a .c implementation of the self-hosted compiler
  • Delete the stage1 c++ code and check that generated .c code into the zig source repository to make it easy to build zig. This does not count as bootstrapping because the generated .c code is not source code. It's more like committing a multi-target binary into source control.
  • Exploit the fact that we only have 1 codebase for the zig compiler to iterate faster, solve bugs, finish the language, and approach 1.0. This is the main motivation for this proposal.
  • Do further reduce bootstrapping dependencies by making stage1 output C rather than LLVM IR #5246 (comment)
  • Release 1.0, with the O(1) bootstrapping feature restored.
@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. stage1 The process of building from source via WebAssembly and the C backend. labels Sep 18, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Sep 18, 2020
@joachimschmidt557
Copy link
Member

@andrewrk Is it planned to have a split between stage0 and stage1 again? As in that stage1 of 1.0 is composed of

  • stage0 code which was originally translated from zig to C, now maintained on its own (parsing, semantic analysis, codegen, etc.)
  • stage2 code which is written in zig (zig fmt, etc.)

This would reduce the amount of C code needed to maintain.

@andrewrk
Copy link
Member Author

Yes and I have that working over in #6250 (with a fat checklist needed to be done before merging)

@Rocknest
Copy link
Contributor

So we would have stage1 autotranslated from stage2 after all, for some time at least?

@marler8997
Copy link
Contributor

If instead you took the C output and just copied/pushed that to the zig-bootstrap repo every so often, then wouldn't that mean you wouldn't have to maintain the stage1 C compiler by hand ever?

@matu3ba
Copy link
Contributor

matu3ba commented Apr 17, 2022

Assume for this comment that c codegen "just works" and "cleanup can be done":
rg -g '!arch' -g '!codegen' -g '!link' -g '!stage1' -g '!translate_c' -g '*.zig' 'std.ArrayListUnmanaged' | wc -l shows 27 instances of ArrayListUnmanaged inside folder src (the compiler implementation).

Once we touch things inside ArrayListUnmanaged or worse refactor things to a different datastructure, we need to redo the work on cleaning up C files and its unclear how much can be simplified in a maintainable way with macros or combine functions in other ways.

  1. What is the necessary list of .zig files, which require translation to C? They should be listed for planning.
  2. Is there follow-up tooling necessary to get "what stuff was changed" between 2 outputs from translate-c to track bigger refactorings (changes of basic data structures etc)? Synchronizing bigger changes manually sounds like recipe for disaster to me (multiple changes in combinations may introduce bad behavior)
  3. If 2 is not necessary/its too much hassle to update changes of the basic data structures, ie we "freeze" the translated libc+compiler code as its better than current stage1 etc: How are contributors motivated to fix stage1 shenanigans/hacks that are introduced and we do not end up with the same situation as current c++ stage1 in the long run, where people dont want to touch some code?

@ghost
Copy link

ghost commented Apr 18, 2022

I've also been thinking, on and off, about the feasibility of this proposal. Besides the problem of translating comptime and generics into C, as noted by @matu3ba, there's also the fact that stage2 is a pretty big piece of code. It is a production quality compiler with many features that are extraneous to the task of bootstrapping:

  • Compiler optimizations
  • Native backends, LLVM, web assembly
  • Inline assembler
  • Expressive error messages and diagnostics
  • Efficient implementation, caching system
  • Documentation generator and other tooling
  • Native linker
  • C compiler frontend
  • . . .

The bootstrap compiler, on the other hand, only needs a minimal parser and comptime system to translate Zig directly to C in some not terribly inefficient fashion. All actual development and debugging can be done with stage2, so the bootstrap compiler could cut a lot of corners: e.g., doc comments can be ignored entirely; some comptime checks and computations can be deferred to runtime; inlining directives can be mostly ignored, along with many error conditions like unused items. And that's just a couple of things off the top of my head.

I'm guessing that a dedicated bootstrap compiler/translator focusing on simplicity and maintainablility might be of the order of 10 KLOC, while stage 2 (including the standard library) will be an MLOC project before long. It seems to me that maintaining a separate copy of stage 2 in C will likely create more work in the long run than writing a bootstrap compiler from scratch.

@matu3ba
Copy link
Contributor

matu3ba commented Apr 18, 2022

All actual development and debugging can be done with stage2

On changes (before the release 1.0 spec/fixed semantics) and for basic quality assurance (+ easing development), one wants tests run in CI for the bootstrap compiler for the edge cases on different architectures etc.
Base for the test system could be inspired by tau (MIT licensed) https://github.com/jasmcaus/tau as it unfortunately does only support Mac, Linux, Windows (with the most common compilers).
Though I think dogfeeding might be simpler.

some comptime checks and computations can be deferred to runtime

The Zig compiler has no Zig backend (ie for comptime being resolved) and several comptime things dont exist intentionally at runtime.

order of 10 KLOC

My reference is arocc compiler for C, which is an arguably simpler and comparable language (due to no comptime): 89c6a7a7a6722d3965c1b81dfbab3800c2655cbd tokei . inside src/ shows me 18120 LOC code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. stage1 The process of building from source via WebAssembly and the C backend.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants