Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MOREL-53] Optimize core language by inlining expressions
The paper "Secrets of the Glasgow Haskell Compiler inliner" (Peyton Jones and Marlow, 1999, revised 2002) describes the approach. Given the query let val emp = scott.emps in from e in emp yield e.deptno end we need the compiler (in particular the Calcite compiler) to know that emp is always equivalent to scott.emps and therefore can be translated to a Calcite TableScan. Without inlining, to be safe, we would have to generate a Calcite plan involving a TableFunctionScan (i.e. indirecting at runtime rather than compile time) and that seriously limits query optimization opportunities on the Calcite side. Inlining also has some other nice effects, such as * "let val f = fn x => x + 1 in f 3 end" becomes "3 + 1" * "let val x = 3 in isOdd 3 end" becomes "isOdd 3" * "(fn x => x + 1) 5" becomes "let x = 5 in x + 1 end" (beta reduction) * "let x = 1 and y = 2 in y + 3" becomes "let y = 2 in y + 3" (remove dead declarations) Inlining is implemented in class Inliner, which is a shuttle that makes multiple passes over the expression tree. It also converts references to built-in functions (say "String.length" or "#length String") into function literals which can be more easily matched by subsequent optimization rules. The paper has guidelines for when inlining can be done safely (without causing code size or runtime to increase). We implement those guidelines in class Analyzer. Refactorings 1. Add class EnvVisitor, and improve Visitor, Shuttle, EnvShuttle. 2. Simplify Core.Fn (whereas Ast.Fn has a list of matches, each with a potentially complex pattern, Core.Fn now has just one IdPat and Exp rather than a match-list; if the function has alternate branches or a complex pattern, those become a Core.Case); 3. Split out Core.Local from Core.Let. Let has a ValDecl; Local has a Datatype. Local has much less effect on inlining than Let, so it is cleaner to separate it. (We still don't support 'local' in the Morel parser.) Why does Core.Local have a DataType, whereas Ast.Let contains a DatatypeDecl (and therefore several DataType declarations)? Because ML datatypes are not recursive; therefore unlike val, we never need simultaneous local. DatatypeDecl now only occurs at top-level; only top-level programs need to declare multiple DataTypes simultaneously. 4. Change the type of Core.Let.pat from Pat to IdPat. After this change, Let patterns are always simple, which makes transformations such as inlining easier. Complex patterns, such as let val (x, y) = (1, 2) in f (x + y) end are now represented using a single-branch 'case': let val v = (1, 2) in case v of (x, y) => f (x + y) end Similarly lists and datatype constructors. Note that we have introduced an intermediate variable, 'v'. Core.Case is now the only element of the Core language that can deconstruct (pattern-match), via its sub-element Core.Match. Contrast with the Ast, where there are several places, such as 'fun', 'fn', 'let' (Ast.FunDecl, Ast.Fn, Ast.Let) in addition to 'case' (Ast.Case). 5. Simplify Resolver.toCore(Ast.Let) and .toCore(Ast.ValDecl). These are complex because there may be several let and local that are intermingled, and we don't know whether we want an Exp or DatatypeDecl at the end of it all. To help, introduce class ResolvedDecl to as an intermediate data structure. 6. Each variable reference (Core.Id) used to reference a variable declaration by name (String) but now contains the variable declaration (Core.IdPat) explicitly. Variable declarations are uniquely identified by a name plus an ordinal, so that all variables in the program are unique, and shadowing of variables doesn't occur when an expression is inlined in a different scope. 7. Resolver now maintains an environment. This is necessary to generate those unique variable names. 8. Use Util.skip(List) in a few places. 9. In class Binding, replace the name and type fields with an IdPat; add exp field (which we use for inlining). 10. When translating 'fn () => E' to core, don't need 'case'.
- Loading branch information