-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bonus Points: The "I'll F**king Show Them" Bootstrapping Plan #6723
Comments
Guess this is related to https://github.com/oriansj/stage0 and http://guix.gnu.org/en/blog/2020/guix-further-reduces-bootstrap-seed-to-25/ which aim to do the same thing. It would be interesting and go closer to stage0 or the sort for a completely clean bootstrapping build pipeline but that might be less practical. |
A nice vanity project for the post 1.0 future. Although an intermediate dialect of zig is a bad idea, because it could lead to fragmentation. There is already zir, and it could be a lot simpler to implement in the assembly. |
Easier to parse, sure, but ZIR still has the full power of Zig -- so, easier to implement, no. And I'm not necessarily sure a tZIR would be better than a tZig -- ZIR makes comptime more explicit than Zig, so that functionality would be harder to special-case. It's worth considering though. |
(what i'm saying assume we use stage2 compiler, which is likely since your idea would be implemented in post-1.0) Don't know if i already said that, but days have changed since BCPL times, seeing the goal of this project (your "But Why ?" section), can't it just be implemented by cross-compiling from a compatible architecture (assuming stage2 is done) ? It sure sounds less cool but it's easier than having a custom platform creator create a whole compiler in assembly language and it's also easier for the compiler writers to not write in a dialect that strict. We don't have to depend on C for bootstrapping (assuming stage2), we don't even have to bootstrap if we do cross compiling correctly. The only case bootstrapping is actually useful is if all binaries of the Zig compiler magically disappear from everyone's PC on all the planet and nobody did a backup, but like Rocknest said, your idea will create fragmentation, because we will have architecture creators that will have to create a stage0 compiler (in assembly) to compile the stage1 compiler (written in your tZig proposal) to compile the stage2 compiler (in Zig). That isn't DRY, even if it's subsets, you end up implementing 3 times a Zig compiler, and the stage1 and stage2 compilers will have a lot of code to share, but unless rewriting 95% of the stage2 compiler in tZig, this won't be possible. |
I have to agree with @zenith391 here. The scope of the project is inspiring, but also a bit ... unnecessary? Kind of like the Space Shuttle -- technically brilliant, but very resource intensive and not really doing anything useful that can't be achieved in other ways. Excising C++ and LLVM as bootstrapping dependencies (eventually) seems like a worthwhile goal, but going beyond C the cost-benefit ratio seems to drop off quite a bit:
|
I'm pretty sure just about everyone active in Zig knows more about this kind of thing than I do, low level compiler details are outside my skill. But my understanding is that it would be difficult or impossible to get a byte-for-byte identical binary out of different C compilers running on different platforms. i.e. there is always "drift", which is what makes reproducible, provably safe builds such a hard problem to solve. On the general proposal, I love it - but again, I wouldn't be able to contribute. (I'd love to learn, but my day job working with languages a hundred times uglier than Zig takes up my time.) I'll leave the heavier feedback to the people who would be doing the work. |
Sure, it could make a "drift", but if we compile the stage1 compiler, use it to compile the stage2 compiler and then use the stage2 compiler to compile itself, there will be no drift and the build will be byte-for-byte identical. There's also, like i said in my above comment, the fact that you will be able to cross-compile directly using the stage2 compiler, which will make us independent from C once the stage2 is done. |
I think I understand that, thanks. |
And another counterpoint: Writing stage0 in assembly could easily make the code less understandable/auditable/portable for distro maintainers, who, in my understanding, will be the primary users of the bootstrap chain. Other devs will use an existing compiler binary 99.9% of the time anyway. I think that Andrew's original plan (#853) to keep a one-stage bootstrap process by continuing to maintain the existing stage1 compiler is sound. Optionally, stage1 can be converted to C output or even pure C (#5246), so that the rather large and brittle dependency of LLVM can be dropped from the bootstrap chain. But I really fail to see the value of going beyond that, other than just for the sake of it. |
From an ultra-paranoide perspective, you need to be able to bootstrap from inspectable hardware like the precursor or better microscopical inspectable hardware. Otherwise, you are trusting that at no time in the past the binary code of compilers was compromised. The much easier and currently favoured way is to use platform backdoors like IME and hardware backdoors (like special CPU instructions, chip configurations etc). The problem with those is, that they may enable the aforement problem of subverting everything and you cant detect it. |
This is a cool idea but it is out of scope for this project. All Zig needs to do is provide a reasonable way to bootstrap from another well supported language, and C is ideal for this. I hope somebody tries to bootstrap Zig from scratch in a third party project some day. That would be sweet. |
most based proposal I've ever read |
This may be the single most ambitious, most awesome thing we could possibly do.
Absolute Zero
Bootstrapping Zig from below C level.
Introduction
In the early days, when BCPL was considered high-level, compiler languages were bootstrapped from assembly. The process was to define a subset of the target language which was easier to process, writing a compiler for that in assembly, writing a compiler for the full language in the subset, then writing a full compiler in the full language. At the time, this was done out of necessity, but it also meant you had a fully auditable system right down to bare metal, as well as a concrete path to recreate everything from scratch.
This isn't done anymore -- computers are so ubiquitous, and C-based software stacks so widespread, that quite literally all languages and systems that "bootstrap" start from C or C++, and abandon their roots as soon as they can build themselves. Zig stands out by maintaining its bootstrap chain, but this only extends down to LLVM/Clang/LLD, and there are currently no plans to go further down than a system C compiler. "Good enough", maybe, but we can do better.
The Plan
To maintain a bootstrap chain that goes right down to assembly, such that we don't even need a system C compiler to get off the ground. We do this by defining a strict subset of Zig, herein referred to as tZig (for "thin Zig" or "tiny Zig", take your pick), which can be compiled in one pass and more closely matches the operations performed by (most) physical architectures. The stage 1 compiler is then written in this subset, and it's Zig all the way down. We can provide tZig compilers in various assembly languages for those who want a fully auditable system, and one in C as a catch-all -- and if we miss one, and that target doesn't have a C compiler either, the user can write their own.
Our job is more complicated than C's, since we need to support multiple architectures, and Zig the language is designed assuming a fairly sophisticated compiler to start with. This means we need to be more restrictive than is strictly necessary for any one architecture, and we do need to make some assumptions. It also means that we need to keep implementational simplicity as a key consideration, so that other users can cover ground that we haven't.
Restrictions
Any language feature not explicitly prohibited by these is allowed. The goal is to have a single-pass compilable language with only one operation per statement that is also fully compliant with ZIg -- doubtless I've missed some things which make that impossible, additions welcome.
@import
shall be allowed, but the@import("std")
special case shall not -- see belowunion (enum)
-- enum type must be explicitly declaredenum
literalsdefer
/errdefer
and ((onereturn
OR one assignment/declaration) and (onetry
/catch
/orelse
and one ALU op/function call/dereference/index/member chain/global variable on either side OR one complex literal)) OR oneif
statement with optionalelse
branch OR oneswitch
statement OR onewhile
loop with optionalcontinue
statementwhile
/if
clause must consist of a single ALU op on simple values/variables OR a single boolean/optional value/variable with optional captureswitch
payload must be a simple value/variablefor
loopsasync
To provide necessary platform functionality, there shall be one special-case import,
@import("bootstrap")
, that containshooks into the implementation for heap allocation, file I/O, and other things necessary for a compiler. This interface needs to be carefully designed to be as small and portable as possible. When compiling as Zig, this will alias selected standard library functionality.
Notes
@import("std")
requires an implementation to process the whole standard library, or to understand lazy evaluation (and by extension the full complexity of Zig syntax) to avoid doing so. There is no reasonable way to mitigate this, and no significant reason to use arbitrary stdlib functionality anyway, so it is disallowed. Interfacing with platform binary functionality can be done withextern
.&
. Note that this has no restrictions on its use at all -- its presence actually means less work for the compiler.return
. The presence of this is immediately known, and attaching it to any computation is not much effort. Also, in case an implementation ever wants to include tail recursion, compoundreturn
is necessary.try
/catch
/orelse
. Similar toreturn
. This represents the standard use case for these keywords.defer
/errdefer
require some implementational complexity and potential duplicate machine code to fit into a single-pass framework, but they are too useful to leave out.for
loop can be rewritten as awhile
loop with a continue statement, andfor
loops comparatively have hidden state and control flow (the current index and bounds check). These are small details, but everything must be explicit.async
is a deliberate departure from platform ABIs, hence it might not be obvious how to implement it on every platform. If it is useful, however, it can be accommodated.@import("bootstrap")
is necessary in the absence of inline assembly and the impossibility of the standard library, but I acknowledge it's ugly. However, the only other solution I can see given the restrictions is directly interfacing with platform binaries, which would be possible but much more fiddly and less portable. The best we can do is make sure we provide the simplest and most portable interface we can.But Why?
In no particular order:
Parting Words
According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway, because bees don't care what humans think is impossible.
The text was updated successfully, but these errors were encountered: