Allow the feature of O(1) bootstrapping to temporarily regress #6378

andrewrk · 2020-09-18T22:12:29Z

Related: #853

Right now the bootstrap repository can start with cmake, a C++ compiler, bash, and cross compile zig for all supported targets. It does this in a fixed number of steps, because the Zig compiler is currently in c++ code and depends on an LLVM backend.

This proposal is to allow this feature to regress for a period of time between now and 1.0, restoring the feature before 1.0.

What that regression would look like is:

Finish implementing self-hosted (stage2) and pass all the behavior tests, std lib tests, and able to build itself.
Finish the C backend.
Use the C backend to generate a .c implementation of the self-hosted compiler
Delete the stage1 c++ code and check that generated .c code into the zig source repository to make it easy to build zig. This does not count as bootstrapping because the generated .c code is not source code. It's more like committing a multi-target binary into source control.
Exploit the fact that we only have 1 codebase for the zig compiler to iterate faster, solve bugs, finish the language, and approach 1.0. This is the main motivation for this proposal.
Do further reduce bootstrapping dependencies by making stage1 output C rather than LLVM IR #5246 (comment)
Release 1.0, with the O(1) bootstrapping feature restored.

joachimschmidt557 · 2020-09-19T07:35:52Z

@andrewrk Is it planned to have a split between stage0 and stage1 again? As in that stage1 of 1.0 is composed of

stage0 code which was originally translated from zig to C, now maintained on its own (parsing, semantic analysis, codegen, etc.)
stage2 code which is written in zig (zig fmt, etc.)

This would reduce the amount of C code needed to maintain.

andrewrk · 2020-09-19T07:39:06Z

Yes and I have that working over in #6250 (with a fat checklist needed to be done before merging)

Rocknest · 2020-09-19T15:21:39Z

So we would have stage1 autotranslated from stage2 after all, for some time at least?

marler8997 · 2020-09-20T08:12:15Z

If instead you took the C output and just copied/pushed that to the zig-bootstrap repo every so often, then wouldn't that mean you wouldn't have to maintain the stage1 C compiler by hand ever?

matu3ba · 2022-04-17T22:26:05Z

Assume for this comment that c codegen "just works" and "cleanup can be done":
rg -g '!arch' -g '!codegen' -g '!link' -g '!stage1' -g '!translate_c' -g '*.zig' 'std.ArrayListUnmanaged' | wc -l shows 27 instances of ArrayListUnmanaged inside folder src (the compiler implementation).

Once we touch things inside ArrayListUnmanaged or worse refactor things to a different datastructure, we need to redo the work on cleaning up C files and its unclear how much can be simplified in a maintainable way with macros or combine functions in other ways.

What is the necessary list of .zig files, which require translation to C? They should be listed for planning.
Is there follow-up tooling necessary to get "what stuff was changed" between 2 outputs from translate-c to track bigger refactorings (changes of basic data structures etc)? Synchronizing bigger changes manually sounds like recipe for disaster to me (multiple changes in combinations may introduce bad behavior)
If 2 is not necessary/its too much hassle to update changes of the basic data structures, ie we "freeze" the translated libc+compiler code as its better than current stage1 etc: How are contributors motivated to fix stage1 shenanigans/hacks that are introduced and we do not end up with the same situation as current c++ stage1 in the long run, where people dont want to touch some code?

ghost · 2022-04-18T12:59:24Z

I've also been thinking, on and off, about the feasibility of this proposal. Besides the problem of translating comptime and generics into C, as noted by @matu3ba, there's also the fact that stage2 is a pretty big piece of code. It is a production quality compiler with many features that are extraneous to the task of bootstrapping:

Compiler optimizations
Native backends, LLVM, web assembly
Inline assembler
Expressive error messages and diagnostics
Efficient implementation, caching system
Documentation generator and other tooling
Native linker
C compiler frontend
. . .

The bootstrap compiler, on the other hand, only needs a minimal parser and comptime system to translate Zig directly to C in some not terribly inefficient fashion. All actual development and debugging can be done with stage2, so the bootstrap compiler could cut a lot of corners: e.g., doc comments can be ignored entirely; some comptime checks and computations can be deferred to runtime; inlining directives can be mostly ignored, along with many error conditions like unused items. And that's just a couple of things off the top of my head.

I'm guessing that a dedicated bootstrap compiler/translator focusing on simplicity and maintainablility might be of the order of 10 KLOC, while stage 2 (including the standard library) will be an MLOC project before long. It seems to me that maintaining a separate copy of stage 2 in C will likely create more work in the long run than writing a bootstrap compiler from scratch.

matu3ba · 2022-04-18T22:44:15Z

All actual development and debugging can be done with stage2

On changes (before the release 1.0 spec/fixed semantics) and for basic quality assurance (+ easing development), one wants tests run in CI for the bootstrap compiler for the edge cases on different architectures etc.
Base for the test system could be inspired by tau (MIT licensed) https://github.com/jasmcaus/tau as it unfortunately does only support Mac, Linux, Windows (with the most common compilers).
Though I think dogfeeding might be simpler.

some comptime checks and computations can be deferred to runtime

The Zig compiler has no Zig backend (ie for comptime being resolved) and several comptime things dont exist intentionally at runtime.

order of 10 KLOC

My reference is arocc compiler for C, which is an arguably simpler and comparable language (due to no comptime): 89c6a7a7a6722d3965c1b81dfbab3800c2655cbd tokei . inside src/ shows me 18120 LOC code.

andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. stage1 The process of building from source via WebAssembly and the C backend. labels Sep 18, 2020

andrewrk added this to the 0.8.0 milestone Sep 18, 2020

andrewrk added the accepted This proposal is planned. label Oct 4, 2020

andrewrk mentioned this issue Oct 4, 2020

zig0 takes too much RAM to build zig1.o #6485

Closed

andrewrk mentioned this issue Oct 17, 2020

compiler allocation failed #4593

Closed

xackus mentioned this issue Jan 5, 2021

Question: high memory usage of stage1 #7690

Closed

andrewrk mentioned this issue Mar 3, 2021

Stage2 cbe: optionals and errors #7934

Merged

xackus mentioned this issue Mar 28, 2021

Emit null or uninitialized pointers as 0 values. #8372

Closed

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021

sskras mentioned this issue Jan 22, 2022

mingw-w64-x86_64-zig: zig fails to start, libLLVM.dll is missing msys2/MINGW-packages#10596

Open

andrewrk mentioned this issue Mar 9, 2022

stage2: error set type equality, error and error union value equality #11098

Merged

andrewrk modified the milestones: 0.10.0, 0.11.0 Aug 20, 2022

hryx mentioned this issue Nov 15, 2022

Make primitive values not keywords #2897

Closed

squeek502 mentioned this issue Nov 16, 2022

Nuke the C++ implementation of Zig from orbit using WASI #13560

Merged

9 tasks

andrewrk closed this as completed in #13560 Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the feature of O(1) bootstrapping to temporarily regress #6378

Allow the feature of O(1) bootstrapping to temporarily regress #6378

andrewrk commented Sep 18, 2020 •

edited

Loading

joachimschmidt557 commented Sep 19, 2020

andrewrk commented Sep 19, 2020

Rocknest commented Sep 19, 2020

marler8997 commented Sep 20, 2020

matu3ba commented Apr 17, 2022 •

edited

Loading

ghost commented Apr 18, 2022

matu3ba commented Apr 18, 2022 •

edited

Loading

Allow the feature of O(1) bootstrapping to temporarily regress #6378

Allow the feature of O(1) bootstrapping to temporarily regress #6378

Comments

andrewrk commented Sep 18, 2020 • edited Loading

joachimschmidt557 commented Sep 19, 2020

andrewrk commented Sep 19, 2020

Rocknest commented Sep 19, 2020

marler8997 commented Sep 20, 2020

matu3ba commented Apr 17, 2022 • edited Loading

ghost commented Apr 18, 2022

matu3ba commented Apr 18, 2022 • edited Loading

andrewrk commented Sep 18, 2020 •

edited

Loading

matu3ba commented Apr 17, 2022 •

edited

Loading

matu3ba commented Apr 18, 2022 •

edited

Loading