Here you can find documentation about Virgil's implementation, including bootstrapping considerations, runtime and compiler details, testing, and performance.
Virgil is a statically-typed, multi-paradigm language. Though it is primarily designed to be compiled directly to machine code and run directly on a hardware CPU, it can also be compiled to virtual machines such as the JVM and WebAssembly. Virgil code can also be directly interpreted by an interpreter built directly into the compiler. Thus Virgil is neither solely "compiled" nor an "interpreted" language--it has implementations that fall into both categories.
The Virgil compiler, contained in this repo in aeneas/src,
is a static, whole-program optimizing compiler.
A typical invocation of the compiler supplies all source files of a program and specifies a
target architecture and operating system.
There is no separate linking step.
The compiler is responsible for parsing, typechecking, optimizing, and generating code for an
input program.
The compiler generates an optimized binary (ELF, Mach-O, JAR, or .wasm
file) directly.
Each of the phases of compilation is detailed in these separate pages:
- Parsing - the lexical and syntactic phase which processes the text of a program and produces a Virgil syntax tree.
- Semantic Analysis - the typechecking and semantic analysis phase which resolves names of types, variables, and methods, computes and checks for type and other semantic errors.
- Optimizing - the internal phase which analyzes, transforms, and optimizes the program.
- Code generation - the final phase which generates executable code for the specified target platform.
As noted above, the Virgil compiler includes a feature-complete interpreter that can execute programs without generating code for a target. This interpreter is used by the compiler to run the initialization phase of Virgil programs prior to generating code and can also be used to run whole programs for testing and debugging. It works on an internal representation of the code, rather than on source or machine code.
The Virgil programming language relies on little to no code written in other languages. Virgil is "self-hosted": its compiler, runtime system, garbage collector, libraries, and system interface are all written in Virgil. In fact, other than scripts for testing, this repo is almost entirely Virgil code.
Yet to run any code in any language, we must first have a compiler or interpreter for that language in a format runnable by some machine. How does Virgil do this? See here for how the Virgil language bootstrapped originally and how it bootstraps each successive version of the compiler.
The executable programs generated by the Virgil compiler need a bit of supporting code to get running.
In most other languages, there is some underlying startup or runtime code written in a different language
that gets a program running, before entering into the code of the program itself.
But Virgil is extreme on its avoidance of dependence on other languages.
In fact, for the native targets (x86-darwin
, x86-linux
, x86-64-darwin
, x86-64-linux
) as well as the
WebAssembly target (wasm
), there is no startup code written in any other language besides
Virgil itself.
Program startup on each platform is a little different and is explained in detail here.
On some targets, such as Linux and Darwin, Virgil allows direct access to kernel system calls. While not intended for applications in the long run, this is the mechanism that allows the runtime system and startup code to be primarily written in Virgil. It also allows building IO and system libaries that unlock the full capabilities of these targets. See here.
Virgil is a lightweight language, without much need for a runtime system with a lot of services. However, a few key services, like printing a stack trace when an exception occurs, require some additional runtime code, which is also implemented in Virgil.
Virgil is a memory-safe language without automatic memory management via garbage collection. The entire garbage collector is written in Virgil. It has a key interplay with the compiler which is documented here.
Virgil has a strict separation between language concepts and library code. For example, there are no built-in classes or methods in Virgil; everything more complex than primitives and arrays is considered to be part of the "user program". However, particularly useful utility code like dealing with strings, encoding and decoding data, rendering numbers, and some IO helpers are included. That utility code is entirely optional; nothing in the language forces your program to use it. See here.
The Virgil language is designed to be low-overhead in that every source-level operation has a straightforward and small cost at the machine level. For example, there are no implicitly-allocating operations in Virgil (such as auto-boxing), so innocent-looking code cannot cause memory pressure issues or significant garbage collection overhead. Nearly every operation breaks down to short sequences of machine operations, making it easier to reason about program performance at the source level, even though Virgil is a safe language with proper abstractions. See the performance guide to get an idea what to expect from the Virgil compiler.
The one major exception to Virgil's self-reliance for bootstrapping and startup is testing. This is important for getting Virgil to run on a new platform where nothing is working yet. For example, we must be able to trust that the testing framework itself is not broken and reporting tests passing when they are actually failing! Virgil mostly relies on shell scripts for this. See here for an explanation of test scripts and how the testing framework works.