-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some additional performance optimization #119
Conversation
val l = visitExpr(lhs) | ||
val r = visitExpr(rhs) | ||
def fail() = Error.fail(s"Unknown binary operation: ${l.prettyName} ${Expr.BinaryOp.name(op)} ${r.prettyName}", pos) | ||
op match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be annotated : @switch
to ensure the tableswitch
-based compilation isn't accidentally broken?
Ditto for the visitUnaryOp
pattern match above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR @switch
is broken. But even if it works, it also accepts a lookupswitch
so it's not much of help.
def cast[T: ClassTag: PrettyNamed] = | ||
if (implicitly[ClassTag[T]].runtimeClass.isInstance(this)) this.asInstanceOf[T] | ||
else throw new Error.Delegate( | ||
"Expected " + implicitly[PrettyNamed[T]].s + ", found " + prettyName | ||
) | ||
def pos: Position | ||
|
||
private[this] def failAs(err: String): Nothing = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this take an implicit T: PrettyNamed
for the error message, rather than passing in the err
string manually? That would help ensure that we're consistent with our names for the various Val.*
classes
|
||
parse("1 + 2 * 3") ==> | ||
BinaryOp(pos(2), Num(pos(0), 1), BinaryOp.`+`, BinaryOp(pos(6), Num(pos(4), 2), BinaryOp.`*`, Num(pos(8), 3))) | ||
BinaryOp(pos(2), Num(pos(0), 1), BinaryOp.OP_+, BinaryOp(pos(6), Num(pos(4), 2), BinaryOp.OP_*, Num(pos(8), 3))) | ||
} | ||
test("duplicateFields") { | ||
parseErr("{ a: 1, a: 2 }") ==> """Expected no duplicate field: a:1:14, found "}"""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a few tests to ParserTests
to validate the construction of static Val.Arr
s and Val.Obj
s? A lot of the new changes are semantically indistinguishable from the status quo, and thus wouldn't get validated through the normal course of compiling jsonnet. We're also starting to have edge cases that are worth validating in tests, e.g. nested static arrays, nested static objects, alternating nested static arrays and static objects, static arrays containing a mix of static primitives and other static arrays, etc.
@@ -33,20 +33,24 @@ class Evaluator(parseCache: collection.mutable.HashMap[(Path, String), fastparse | |||
def visitExpr(expr: Expr) | |||
(implicit scope: ValScope): Val = try { | |||
expr match { | |||
case Id(pos, value) => visitId(pos, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you re-ordered these to try and speed up lookup for the most common cases. How did you find the order? e.g. was it just a guesstimate, did you instrument it to see which ones are most common, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gathered statistics from our universe benchmark. It may not be representative for all use cases but the current version doesn't seem to be ordered for performance anyway.
sjsonnet/src/sjsonnet/Val.scala
Outdated
Error.fail("Too many args, function has " + params.names.length + " parameter(s)", outerPos) | ||
} | ||
arrI | ||
} else if(params.indices.length < argsSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would moving this case to the top of the if-else chain allow us to avoid the try
-catch
above? That would also allow us to avoid duplicating the Error.fail
call. AFAICT the most common code path is the last else
, so re-ordering the first two cases shouldn't slow things down too much
Looks good, left some comments |
Some of those might already be obsolete with the further changes I made. Current benchmark time is 483ms. I'm still exploring a few options for improving it. |
Updated with the latest changes. I'm running out of ideas and haven't made progress in a while. We should get this version merged. I tested it against universe. It will require 2 changes there because we relied on incorrect behavior of the old Sjsonnet release. |
0.4.0 stands at 751 ms in our benchmark. Here's the progress over the course of this PR:
|
@@ -6,7 +6,7 @@ cancelable in Global := true | |||
|
|||
lazy val main = (project in file("sjsonnet")) | |||
.settings( | |||
scalacOptions in Compile ++= Seq("-opt:l:inline", "-opt-inline-from:sjsonnet.**"), | |||
scalacOptions in Compile ++= Seq("-opt:l:inline", "-opt-inline-from:sjsonnet.*,sjsonnet.**"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -opt-inline-from
syntax is confusing. sjsonnet.**
only inlines from subpackages (which we don't have). We need to add sjsonnet.*
to inline from the sjsonnet
package, too. Ultimately it doesn't make a big difference. HotSpot has become so good that it doesn't matter in most cases but the optimizations are not entirely deterministic. Sometimes you end up with a benchmark run that is 10% slower than it should be. Letting scalac do the trivial inlining (and thus creating less work for HotSpot) makes this more reliable.
Looks good, few things before we merge it:
Other than that, I have reasonable confidence in the test suite and our ZZZ golden files to catch any bugs before they slip through. Main concern now is just to make sure the knowledge of why you did each of these commits you earned from the hours you spent on this stuff is preserved in the codebase/git, for future maintainers to pick up and continue. Especially non-obvious things like avoiding megamorphic code in hot paths, removing boxing, fast-paths for common usage patterns, would be much harder for someone to later reverse-engineer v.s. simply reading why you did something |
This hides the internal array and provides all the necessary methods to access it efficiently. There are experiments in https://github.com/szeiger/sjsonnet/tree/wip/perf-arropt that didn't improve performance.
Applyer required an extra object and indirection for every call of a higher-order function. We can do without the convenience and use Val.Func directly, passing the EvalScope and FileScope directly.
This introduces the Val.Builtin types for built-in functions of various arities, plus arity-specific apply methods for all functions. The arity-specific methods and types allow us to avoid array allocation for parameters. The Builtin types further avoid allocating a new ValScope when calling a built-in function. This commits starts the refactoring of standard library functions to the new Builtin types, which is an ongoing effort. Normal Val.Func functions are still supported, but they do not benefit from the same optimizations.
Refactor more standard library functions to use `Val.Builtin` and introduce convenience methods in `Val` for type coercions. These take the place of the `ReadWriter` implicits (which are still used in some places). The old way of implementing built-in functions as `Val.Func` via the `builtin` methods handled type coercions automatically (with `ReadWriter`), but the new style of manually implementing `Val.Builtin` is much easier with the convenience methods.
Some performance optimizations for `builtinWithDefaults` to avoid complex collection operations when calling such a function.
This starts the refactoring of built-in functions into objects (instead of anonymous classes) and removes the unnecessary FileScope.
Avoid creating many individual subclasses of `Lazy` for representing strict values (which have already been evaluated or are safe to evaluate immediately because we know they will be evaluated anyway). The new shared `Strict` class assigns the cached value in the constructor (in addition to returning it in `compute`). This looks unnecessary from a functional point of view, but it is important for performance as it avoids the megamorphic `compute` call the first time it is forced. `force` itself is a short monomorphic method that can easily be inlined by HotSpot.
Arrays can now be literals. Any array expression encountered by the parser which contains only literals, is itself turned into a literal, i.e. an instance of `Val.Arr` rather than `Expr.Arr`.
We already introduced the concept of static objects (created from object literals, containing only members with static names and literal values) in the previous round of optimizations. This can be used for faster key and value lookup.
`Val.Obj.Member` is now an abstract class with an abstract `invoke` method instead of taking a function argument for the member implementation. This avoids the extra object allocation per definition and the extra indirection per call site. When a member returns a statically known value (which is always the case in a static object) we use the special `ConstMember` class which serves the same purpose as `Val.Strict` for `Val.Lazy`.
The main benchmark is still over 600ms at this point. The new parser and materializer benchmarks tell us how much of this time is spent outside the evaluator (a bit over 30ms for the parser and 60ms for the materializer).
This is based on statistics gathered from our benchmark. The class-based lookups in `visitExpr` get compiled to a `lookupswitch` which is generally pretty fast, but with linear time based on the position in the method.
Unary and binary operators already used the same expression types, with the `Op` objects only being used as markers. They can be easily replaced by `Int` literals, thus allowing `visitUnaryOp` and `visitBinaryOp` to be compiled to a `tableswitch` at the outer layer, which makes all operator lookups equally fast.
This commit introduces the static optimizer. The base class `ExprTransform` implements an AST transformer which can recursively transform and rebase an `Expr`. `StaticOptimizer` adds a scoped transform (which keeps track of the names that are in scope for each `Expr`) and implements the first optimization: Calls of the form `std.x(...)`, where `std` is the standard library (i.e. the name `std` has not been shadowed by a local definition) and `x` is a valid method name in the standard library, are replaced by one of the new `Expr.ApplyBuiltin` expression types (for various arities). This allows us to skip looking up `std` and `x` again during evaluation.
- Add a new benchmark for the optimizer (~2-3ms in our normal benchmark run) - Introduce the `Resolver` abstraction for resolving imports - Refactor the scoped transformations from `StaticOptimizer` into a new superclass `ScopedExprTransform`
Another statistics-based reordering since we added a number of new expression types since last time.
This replaces `ValidSuper` with `SelectSuper`, `InSuper` and `LookupSuper`. A `super` call can only appear in these 3 contexts and the code paths for evaluating them are very different from the non-`super` versions, so it makes sense to split them up.
We can't beat HotSpot when it comes to making the right decision about which small methods should be inlined into `visitExpr`.
We are adding another new step in the optimization of function application expressions. Any `Builtin` now gets the opportunity to rewrite the call site. This is particularly useful for partially applying literal arguments during optimization. For example, the `from` argument of `std.replaceAll` has to be parsed as a regular expression every time the function is called. When it is statically known we now generate a call to a specialized version that performs the parsing only once during optimization.
This is similar to what jfr does for the JVM, but based on the Sjsonnet AST.
The `%` operator can benefit greatly from partial application when the lhs is a string literal, so we add an optimizer rule for this. This is not useful for any other operators. We do not need a generic mechanism like we have for `Builtin` functions.
- As another step in function application optimization we now try to statically apply a `Builtin` function when all arguments are literals. - Some micro-optimizations for `std.length`.
- Don't parse JSON to an intermediate ujson AST first in `std.parseJson`. The new `ValVisitor` can parse directly to an Sjsonnet `Val`. - Similarly the new `MaterializeJsonRenderer` used by `std.manifestJson` renders the output without an intermediate ujson AST.
We can avoid allocating a new `ValScope` for each predicate call in `std.filter`. Normally every definition has to copy the existing scope instead of updating a shared array because any value could be read at an arbitrary later time due to lazy evaluation. But this is not the case when a function returns a primitive value. In particular, in `std.filter` we are repeatedly calling the same predicate function and we only check if it returned `Val.True`. The value is not stored anywhere for later use. This makes it safe to reuse a single `ValScope` with a single bindings array for all calls.
Built-in functions with defaults were still treated as `Val.Func`. With the old scope handling we would have needed a more complex implementation to also handle default values but this is no longer a problem. We simply have to pass them on to the superclass. Now that we can turn `std.setInter` into a `Builtin` we can optimize it further by partially applying a static argument.
Calls of the form `std.length(std.filter(...), ...)` can be optimized to skip creation of the filtered array. We can simply count as we go along. It is not clear at this time if the Jsonnet specification allows us to go further and skip evaluation entirely in cases like `std.length(std.filter(...), ...) == 0` so we still evaluate everything.
Some micro-optimizations
When looking up definitions in the static scope of a `ScopedExprTransform` it is not always possible to see the value at the current phase (i.e. after `StaticOptimizer` vs after `Parser` at the moment; we do not have more phases yet). This is a general problem in any language that allows recursive references in definitions. Previously we used the simplest possible implementation in `ScopedExprTransform`: All definitions that are made together (e.g. in a single `local` expression) are stored in the scope at the same time (using their value after the previous phase), and then they are evaluated to provide an improved scope in which the body of the `local` can see them with the updated values after the current phase. This prevents full optimization in cases like this: ``` local a = 1, b = a, c = b; c ``` When the rhs of `c` is optimized, the scope still contains `b = a` even though `a` has already been inlined (`b = 1`). With this PR we do the next better thing: Update the scope incrementally to allow back-references to see the current phase. Note that `a`, `b` and `c` are all allowed to refer to each other and they may be defined in an arbitrary order. In these cases we can still miss some optimizations, but supporting forward references would require lazy evaluation of scopes and recursion detection. In practice most references that benefit from these optimizations are expected to go backwards and we want to keep it simple.
This avoids creating `Lazy` objects in some cases when we alreayd know that evaluation will be strict.
Some refactoring to simplify and generalize these optimizations.
We have to do them before the optimizer because they are based on syntax.
PrettyYamlRenderer relies on subVisitor calls for individual elements (and no subVisitor call for an empty array)
The current behavior (after the scope handling overhaul) is correct (matching the specification and Google Jsonnet) but the error message was misleading. Sjsonnet 0.4 did not detect the illegal call at all.
`extVar` depends on external variables which are part of a specific evaluation. We have to ensure that they do not end up in the shared parse cache.
The result is puzzling but it matches the specification and Google Jsonnet. Returning a nullary `function() true` causes it to be evaluated during materialization to `true` but comparing it explicitly to `true` must yield `false` because there is no implicit evaluation in this case. This was previously broken in Sjsonnet.
They are only needed for tests and benchmarks. All production code uses the new implementations.
Oops, replaceAll accidentally did the work twice.
Updated with additional docs. |
The latest result from 0.4.0 was:
This PR brings it down to:
The main optimizations are:
ValScope
is now symbol-based instead of name-based. New bindings are appended at the end with no shadowing;self
,$
andsuper
are treated like regular variables in the scope.StaticOptimizer
which uses a single AST transformation after parsing to implement several of the following optimizations (per source file; stored in the parse cache for reuse)ApplyBuiltin
) and user-defined functions (Apply
)Applyer
, lambdas in members & standard library functions)ValScope
allocation for standard library callsBuiltin.specialize
(e.g. pre-compiled patterns informat
andstrReplaceAll
; specialized implementation oflength(filter(...))
)Val
a subclass ofLazy
to avoid unnecessary wrappers when aLazy
is required for an already computedVal
Evaluator
methods by using atableswitch
-based dispatch for operators and alookupswitch
ordered by frequency (in our benchmarks) for node typesRenderer
implementations based on the latest ujsonVal
without an intermediate usjon ASTNew features:
ExprTransform
andScopedExprTransform
for implementing tree transforms (used by the optimizer and some benchmarks)