Zak's New(-ish) C(-ish) Compiler.
(Alternatively, ZNCC is a recursive acronym for ZNCC is Not a C Compiler.)
This is the latest version of my C-like compiler, which is distantly based on LICE: https://github.com/dorktype/LICE
This is NOT meant to be a full replacement for standards-compliant C/C++ compilers. It's rather intended for bootstrapping new architectures and for use in minimalist high-level computing systems (e.g. where code needs to be strictly audited for security reasons or needs to be kept simple for educational purposes, but still needs some complex/modern functionality).
The compiler mostly targets modern, 64-bit platforms and focuses on converting simple pre-processed C-like code into simple assembler code for the target architecture. It can be easily adapted for other use-cases or bundled with additional tools.
- Supports multiple architectures and is easy to retarget
- Mostly tested on x86-64/AMD64 Linux
- Partial support for Windows ABI
- RV64 is about half-supported (basic tests work but there are many broken bits), with minimal support for RV32 (enough for "Hello world")
- Linux targets should also work for FreeBSD/OpenBSD/Solaris/etc. with minimal re-tooling (macOS may require a little more work, but not much)
- Also includes some minimal/experimental support for other/new architectures
- Single compiler tool supports multiple targets (no need for per-architecture compiler builds)
- Compiler is mostly self-hosting (at least on fully-supported targets)
- Supports most (not all) essential C features with some extensions
- Includes some Objective C-like OOP extensions
- Basic floating-point support is included (assuming the target has such support)
NOTE: More detailed language & design features are covered in their own sections.
Using your default host C compiler (Unix-like best practices):
cc -ozncc zncc.c
Or specifically using GCC or similar:
gcc -ozncc zncc.c
- Create a new C++ Command Line project
- Copy/paste the zncc.c code into your main
.c++
file - Rename your main
.c++
file tozncc.c
(or something else with a.c
ending) - Create new header files,
zncp.h
andzncg.h
, copying/pasting the associated code in - Build/run and enjoy!
NOTE: This trick generally works for getting simple C programs working in Visual Studio, which doesn't seem to be configured well for C programming by default.
The compiler takes as input C-like program code (without any preprocessor directives) and generates as output assembler code.
For easy testing with files, the --input
and --output
arguments can be given:
./zncc --input mycode.c --output mycode.s
The assembler code then needs to be assembled/linked (see beneath).
First build the compiler as above. Then download the ZNLC headers, you can place these anywhere convenient (e.g.including in the compiler directory).
Then, create a preprocessed version using GCC's frontend or another preprocessor (NOTE: This differs a little between platforms):
gcc -E -Ipath/to/ZNLC/include -D_ZCC -D_ZCC_X64 zncc.c > zncc.X64.c
If this succeeds, zncc.X64.c
should be the raw, preprocessed C code for the appropriate target. The next step is to run the compiler, producing assembly code:
./zncc --input zncc.X64.c --output zncc.X64.s
Then you can assemble & link, again GCC's frontend comes in handy on Linux:
gcc -static -ozncc.X64 zncc.X64.s
This will produce the self-hosted version zncc.X64
, which you can test by compiling itself as above:
./zncc.X64 --input zncc.X64.c --output zncc.X64.again.s
gcc -static -ozncc.X64.again zncc.X64.again.s
The default settings for now reflect the testing environment (future/integrated versions may detect settings a little better).
More-specific options can be relayed through environment variables:
- The value of
CCB_FAMILY
controls the target architecture:x86
orX86
for commonplace Intel/AMD processors used in most PCs/laptops (currently only supported in 64-bit mode)risc-v
/RISC-V
/riscv
/RISCV
for RISC-V (RV32/RV64-based) or compatible targetsarm
orARM
for ARM-based targets (currently mostly unimplemented)- Potentially other/experimental settings
- The value of
CCB_WORDSIZE
specifies the basic word-size of the target processor:- Only the value of
64
fully works at the moment (to target 64-bit PCs and RV64) - The value of
32
can be used to test the RV32 target (which is less complete than the 64-bit modes) - The value of
16
is also recognised but there are no 16-bit targets at this stage
- Only the value of
- The value of
CCB_CALLCONV
controls the default calling conventions:standard
orSTANDARD
generally implies the "System V" or similar conventions used by Linux/BSD/Solaris systemswindows
orWINDOWS
specifies Microsoft Windows (or ReactOS/WINE) conventions- Note that there is some support for specifying calling conventions on a function-by-function basis, but this isn't fully fleshed-out
- The value of
CCB_ASMFMT
controls the assembler format:gas
is often most useful on Linux/similar systems, and conforms to GNU/GCC's default assembler syntaxfasm
generates code for Flat Assembler which works on x86 systems: https://flatassembler.net/ (this may also be useful for porting to NASM & other targets)raw
uses a simplified syntax, i.e. for testing new targets without good/standard assemblers (this is mostly useless for PC & RISC-V targets for now)
- The value of
CCB_BINFMT
controls the binary format or linker semantics assumed in the assembler code:elf
is generally the default on modern Linux/BSD/Solaris systems, and is ideal for linking with GCC/clang code on those platformsflat
can be used for producing small "flat binary" code snippets, particularly with Flat Assembler
NOTE: These names reflect internal naming (CCB being short for "C-like Compiler Backend") will be updated before the final release.
The compiler generally accepts C-like code with some (experimental/incomplete) Objective C-style extensions. It would roughly be on-par with a pre-standard C compiler, except for modern targets.
This means that features like integers, functions, structs, arrays, pointers, etc. generally work as per usual, but there are some C features which are not implemented in ZNCC:
- Unpacking "vararg" parameters will not work
- Varargs can be declared/called but not unpacked
- This basically means you need to use a different/standard compiler to build any "printf"-like functions (but you can still access them)
- Passing structures as arguments or return values of functions will not work
- In other words, arguments are expected to be either integer/pointer-sized or floating-point values
- This may need to be revised in order to target 32-bit platforms properly (which may need to pass around 64-bit integers)
- String constants with the same text are not guaranteed to be
==
at runtime- This kind of stuff can generally be done and may sometimes be automatic, but depends on linker features
- It's generally considered bad practice to rely on this anyway, but may be worth noting
- Literals are limited to common forms (e.g. don't expect it to support wide strings or pointers to struct literals)
- There will probably never be any support for bitfields
- There is currently no support for C++ style classes/namespaces/templates/..
- Some minimal support may be added in the future, but likely not the whole lot
- The Objective C-like features will be the main focus for OOP-like extensions in the short term
- Floating-point support exists and should generally "work" on supported targets, but is minimal
- This can probably be expanded quite easily in future versions, but will eventually require some platform-specific options
- Large numbers of function arguments will partly work, but not reliably
- Large numbers of integer/pointer arguments will "work", but may misbehave when combined with floating-point arguments (and may throw off the stack alignment for floating point inside the function)
- This is just an incomplete feature (i.e. missing float support), and should be easy to fix incrementally
- Number conversions and precise signed/unsigned/etc. semantics have not been thoroughly fuzz-tested
- There are likely to be some issues with specific types or combinations, but these are usually easy to fix once identified
- Pointer arithmetic exists, but is limited
- For now, this means: Use simple addition/subtraction for pointers, don't rely on increment operators or inner pointer syntax working correctly to obtain offsets
- These issues can probably be solved incrementally (if not, warnings/errors can be added to catch broken cases)
- Error reporting is present, but is not ideal for catching bugs/suggesting solutions
- There is no/minimal optimisation (this is intentional, the focus is on making the thing work first)
- The object-oriented extensions have potential, but deciding on an exact ABI is difficult
- Efficient implementations would easily become platform-specific, while unoptimised implementations may be impractical
- This means that there are some OOP features, but they need to be tweaked and integrate them into some kind of platform to be useful
There is no built-in preprocessor or associated tools. I recently began integrating such tooling, but ran into a couple of issues.
Most notably, any generally-useful preproccessor ends up more complex than the compiler, so self-hosting can become troublesome.
Secondly, the compiler itself (and LICE which was used as a foundation) are licensed under the terms of Unlicense, while other/third-party components usually imply other (or more-vague) licensing conditions.
And thirdly, the compiler part is more-or-less essential for all use cases. The preprocessor, linker, assembler, and so on are more naturally specialised for particular platforms or combinations of technologies. For example a regular desktop target might require a full preprocessor and other tools, whereas an embedded target may only need to recompile straightforward programs and may use the compiler directly with a special frontend & assembler limited to that target.
- https://github.com/ZYSF/ZNLC "fake libc" headers for self-hosting builds (to ensure the compiler doesn't choke on any obscure system headers)