-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC007] First step: AST representation #2072
Conversation
Bencher Report
Click to view all benchmark results
|
For my understanding, do you actually intent the flat AST to be an output of the parser? |
If I understood this correctly, there's no flat AST: just a tree AST and a bytecode format. This PR is the simplified tree AST produced by the parser. It's simplified compared to current nickel because it doesn't have to support evaluation. |
Ah. That's the point I'd missed. Thanks @jneem . |
This commit starts to define an immutable AST, the first representation of the future bytecode virtual machine.
This commit continues the effort of defining a new AST. It introduces many helper methods to build nodes (which requires explicit allocation in arenas), and introduces methods to convert from the current mainline AST representation (unfinished).
Remaining TODO: shuffle arguments order of some primops.
20a6885
to
24b837c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new simple Term
-- it almost fits within one editor window!
Co-authored-by: jneem <joeneeman@gmail.com>
New AST representation for the (future) bytecode compiler
This is an implementation of the first step of RFC007: define the first AST representation, supposed to be output by the parser.
The whole RFC007 is a big chunk of work; we don't want to implement it all at once and make it the default. This PR creates a new
bytecode
module, which is only enabled under thebytecode-experimental
feature, so that mainline Nickel is left unchanged, and we can experiment step by step on the side.Content
This PR is concerned with the AST design part. It defines a new immutable, arena-allocated AST, that has been cleaned from any runtime concern - this should be the AST as produced by the parser in the future.
Any subpart of
term::Term
that relies explicitely on theRichTerm
representation has been copied, cleaned and adapted: mostly the patterns, record and array satellite datatypes.Finally, this PR introduces a
bytecode::compat
module that converts from the current mainline AST to the new AST, to make sure we haven't overlooked any part and that we have enough methods to build any ast node. This part isn't tested yet but this PR is already huge so we left this for future work.AST design guideline
Important: this PR doesn't set anything in stone. The size of the AST hasn't been checked or hardcore optimized yet, and we'll probably have to update the representation as we re-implement typechecking, etc. It's rather a first draft, trying to follow a systematic approach.
The AST is designed to be compact and adapted for processing by various analysis phases (mostly typechecking, code analysis by the LSP and compilation by the future bytecode compiler). Thus we've replaced any
Box
/Rc
by plain immutable references, where the content has been allocated in a centralized arena (actually several ones of them).For
struct
s, we have no reason to use references - for example, ifstruct Foo
has a fieldbar: Bar
, there is no good reason to add an indirectionbar: &'ast Bar
. Thus struct fields use owned data as much as possible.This is the converse for
enum
s: to avoid size bloat, we add reference indirection for any variant where the arguments takes up more than a few words.Because everything is immutable and shareable, and that we want to avoid heap allocation as much as possible for performance reason (arena allocation should be faster), we don't use
Vec<T>
but&'ast [T]
instead, which is the immutable equivalent.We've tried to reduce the variation of the same constructs: while the original AST has two
Let
andLetPattern
,Fun
andFunPattern
,Record
andRecRecord
, and so on, this AST merges all those cases. Whilie we take a small size hit for the simplest cases (alet x = y
now has one indirection, and store the size of a 1 element slice in a fat pointer), this makes the definition and the code consuming it arguably much simpler, and we won't pay this price at runtime since the AST will be compiled away.Similarly, there's no difference anymore between
UnaryOp
,BinaryOp
, andNAryOp
and the correspondingOp1
,Op2
,OpN
: there is only onePrimOp
type, and onePrimOpApp
node taking a slice of arguments. Function application is also made multi-ary, which is a more efficient representation for application to multiple arguments and can also help give better error messages during typechecking for over or under-application.Type
is the only satellite data that hasn't be replicated. It's a problem because it includes aContract
constructor that still refer to the old representation (aRichTerm
). We can also wonder if we we'd like to arena-allocate theType
AST as well, but this is non trivial work and we left it for a follow-up PR.Reviewing
The diff is really big, but keep in mind that a lot of code needed to be copy pasted and almost mechanically adapted. I've also added even more documentation (such as the arguments on primops).
The most interesting is probably type definitions:
Node
,Ast
,Pattern
,PrimOp
, etc. The rest is mostly type-guided implementation.