-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sjsonnet performance improvements #117
Sjsonnet performance improvements #117
Conversation
We have to include the absolute path in addition to the source text in the cache key now that we're generating the FileSscope up front and sharing it across multiple compilations. Otherwise we could get wrong positions with wrong paths in identical files.
I've ran some tests with |
Added yet another bug fix. Now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szeiger I read through all the commits. I don't really have any code-level feedback, but I trust that the test suites (both in this repo, and in our work codebase using Sjsonnet) would be enough to catch any correctness issues (including on the Scala.js and Scala.Native backends), and I trust that the benchmarks are sufficient to ensure the performance improvement is as stated.
Some high-level feedback:
-
Could you write up a few-paragraph summary of the major things that happened in this PR as part of its description? e.g. "unboxed options", "convert collection combinators to while-loops", "re-use literal AST nodes", "replace bitsets", "avoid position allocations", "optimize std", and so on, along with some explanation of why each change was helpful. That would definitely help future reviewers to be able to see at a glance what your high-level approach was, v.s. having to re-construct your intentions from the line-diff like I did when reviewing this.
-
I noticed you turned on
-opt
in the SBT build. Should we turn it on in the Mill build? Or should we only publish from the SBT build? -
If we're publishing artifacts compiled with
-opt
enabled, that will break scala binary compatibility right? Would that cause an issue given that we're still using 2.13.3 in our work code, but this repo is 2.13.4? -
On a similar note, should we publish both
-opt
and non--opt
artifacts? There are folks using Sjsonnet as a library rather than executable, and they may be on other Scala point versions different from what we use at Databricks.
Oh, I thought Ahir already added the optimizer options to the Mill build. Maybe that's why it wasn't as fast in universe as I expected? I never benchmarked without optimizations. There's no need to publish two versions or to turn optimization off, but we should only inline from our own codebase. (At this point there aren't many standard library calls left that are worth inlining anyway.) This will not affect binary compatibility in any way. |
I added the optimizer settings to the Mill build. It doesn't make a big difference to performance. |
If it doesn't make much difference, let's leave it out. That would simplify things and avoid questions about binary compatibility |
It doesn't cause any binary compatibility issues. We're only inlining from sjsonnet, nothing else. |
Oh got it |
This is the collection of changes that @ahirreddy and I made to improve Sjsonnet performance.
Performance baseline for me was Ahir's branch (which should already be 1/3 faster than the current release) running on Zulu 11:
Switching to OpenJDK 16:
And my various improvements on top of that:
Hottest methods in the final state:
High-level overview of important changes:
Array[(Int, Option[T])]
has an extraTuple2
, an extra boxedInteger
and potentially an extraSome
per element. Replacing it by two separate arraysArray[Int]
andArray[T]
(usingnull
instead ofNone
) is always going to be faster. There's not even an argument to be made for better locality when using the tupled version because the tuple contains references. Overall locality is worse (in the new version at least theInt
s are colocated). Many commits in this PR are concerned with removing more and more unnecessary objects.java.util.BitSet
andjava.util.LinkedHashMap
butscala.collection.mutable.HashMap
. The choice is easier for sequence types: Just use plain arrays.map
. The scalac optimizer could inline all of these calls but we can't allow inlining from the standard library because of binary compatibility requirements, so we have to do the rewriting manually.Position
object. An AST node can be evaluated many times so it is better to create thePosition
objects directly in the parser. This is also more correct. An offset is always tied to a specific file.Lazy
. The old approach relied on the usual Scala mechanisms:Function0
for the thunk, plus the actualLazy
), thelazy val
requires a separateBoolean
flag to keep track of the state, andlazy val
uses thread-safe double-check locking. We can cut the allocations in half by makingLazy
a SAM type, and use our own state handling to encode a missing value asnull
(because we know thatf
can never returnnull
) and omit the locking (because we don't need thread-safety).match
expressions like invisitExpr
andvisitBinaryOp
are very efficient. The Scala compiler will remove the tuples and you end up with a nice table dispatch on final class types. We are avoiding separate handling of special cases (like the lazyand
andor
operators) because a single table-based dispatch is faster.Val
s. There is no need to have the parser emit, for example, anExpr.Str(s)
, only to get it turned into aVal.Str(s)
immediately by the evaluator. All literals are emitted directly asVal
s and the evaluator can skip right over them.Expr.Parened(e)
was used for an expression in parentheses. This information is not needed for evaluating the expression. The result is the same as evaluatinge
directly, so we don't even have to produce the node first.builtin
methods to avoid some boilerplate. These had a commonbuiltin0
abstraction for arbitrary arities, which required wrapping values in arrays. We only need 3 different arities and avoiding this abstraction actually makes the code simpler (in addition to removing the unnecessary arrays). For the most common functions we do not usebuiltin
anymore at all. A direct implementation with a bit more boilerplate is even faster.String.split(Char)
quotes the char into a regular expression which then gets compiled just so it can reuse the regular expression engine for splitting. This can be implemented much more efficiently.