-
Notifications
You must be signed in to change notification settings - Fork 15
Towards a semi-performant recursive interpreter #37
Conversation
Awesome! What's that summation test look like? The compiled version could possibly be compiling down to a static result (i.e. removing the loop and just returning the result directly), since 5ns is wicked fast. That wouldn't therefore be a good measure of this PR's performance gains. |
The summation test is here. 5ns/iteration is expected for a GHz CPU (it would be faster still if I had added |
src/generate_builtins.jl
Outdated
end | ||
print(io, | ||
""" | ||
$head name == :$fname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way to do this is to evaluate the function (args[1]
) and then compare directly against the builtin objects, f === Core.tuple
, f === Core.ifelse
, etc. We should also pre-process the code to replace constant GlobalRefs with quoted values to avoid lookups in the common case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I thought of that and then talked myself out of it for fears of introducing the interpreter-equivalent of JuliaLang/julia#265. But now I'm not so sure that's a risk, and in any event I currently clear the cache in between @interpret
invocations.
Thanks a lot for this. I pushed a commit to update the Project + Manifest to get CI running (hope you don't mind). I'll look at the UI tests. |
So the UI tests pass as long as there is a good way to set the |
Manifest.toml
Outdated
uuid = "67417a49-6d77-5db2-98c7-c13144130cd2" | ||
version = "0.1.2+" | ||
|
||
[[DebuggingUtilities]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I added this while I was debugging my own work on this PR---@showln
is very useful. But we don't need it to be part of this package long-term. I'll trust it's OK with you if I overwrite this.
That all sounds very confusing. Any reason not to just execute them directly? They are just normal methods, albeit without source code accessible to reflection (and that block overloading). |
OK, this fixes what I think are the remaining problems. This enhances the performance on my loop tests a teeny bit more (down to 13us per iteration), presumably because of resolving The most important new commit is 19c1492, which fixed a serious bug that caused it to sometimes look up the wrong lowered code for generated functions. Now this seems to behave quite robustly; for example, I can run the subarray tests. They're not fast, but not @KristofferC, I also added kwargs support back to the I think this can be merged. (I can't do that, however.) And I do think we should plan on splitting out the DebuggerFramework functionality out very shortly afterwards. There's plenty of reason to be interested in an interpreter independent of its usage for a IDE or REPL debugger. So perhaps before hitting merge we should think a bit about what that will look like (and what the package names should be). |
One thing that I know have been discussed is to rename this package (or whatever is the package that a user will finally interact with). In 0.6 Gallium took that place (simply reexporting ASTInterpreter2) but I wonder if we should just come up with something fresh. |
That is what it does, see https://github.com/timholy/ASTInterpreter2.jl/blob/teh/localmt/src/builtins.jl. Rather than typing those all out, though, I just check the One thing I didn't do: we might want to consider adding |
I've thought about |
Awesome work. I'll let @KristofferC take this forward in detail. My only concern is that ASTInterpreter2 was supposed to be very small and maintainable, so I'm afraid to add too much here that would make it more complicated. On the other hand, perhaps that's offset by more people looking at that. Also seconded on @vtjnash's question why generating the source file is necessary. |
src/builtins.jl
Outdated
return Some{Any}(Core._apply_pure(getargs(args, frame)...)) | ||
elseif f === Core._expr | ||
return Some{Any}(Core._expr(getargs(args, frame)...)) | ||
elseif f === Core._typevar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this isn't present on 1.0. One wonders if we should avoid committing builtins.jl
and just generate it during Pkg.build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that or maybe @static
check their existence.
@static isdefined(Core, :_typevar) ? f === Core._typevar : false
During build time seems likely to be better though since otherwise we might miss things if we remove some builtins in the future.
We don't have to do it that way, although |
I think the biggest reason to do it is that if we can get reasonable performance with the interpreter it becomes feasible to support breakpoints in the short term. (Supporting breakpoints is easy if all the code is running in the interpreter.) I don't think even a loving author could say that the performance of this PR makes it "reasonable," but it is a ~60x improvement over where I started. If I could get another 10x I'd be really happy. But I'm skeptical that we can get there without some additional higher-level analysis, most important probably being a certain amount of lowered-code inlining to reduce the number of created stackframes. (That seems likely to require a limited form of type-inference...) Julia's polymorphism makes it much more difficult to write a fast interpreter for than, say, languages where |
It'd probably be much faster to ccall that function directly, rather than having
That's good to hear, since that list is large, but |
Doesn't actually work out that way: julia> using BenchmarkTools, ASTInterpreter2
julia> function runifbuiltin(qf, qx)
f, x = qf.value, qx.value
if isa(f, Core.Builtin)
return Some{Any}(f(x))
end
return qx
end
runifbuiltin (generic function with 1 method)
julia> function runifintrinsic(qf, qx)
f, x = qf.value, qx.value
if isa(f, Core.IntrinsicFunction)
return Some{Any}(f(x))
end
return qx
end
runifintrinsic (generic function with 1 method)
julia> qf = QuoteNode(abs)
:($(QuoteNode(abs)))
julia> qx = QuoteNode(3)
:($(QuoteNode(3)))
julia> @btime runifbuiltin(qf, qx)
64.096 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))
julia> @btime runifintrinsic(qf, qx)
11.302 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))
julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(abs)))($(QuoteNode(3))))
julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
16.826 ns (0 allocations: 0 bytes)
:(($(QuoteNode(abs)))($(QuoteNode(3))))
julia> qf = QuoteNode(Core.sizeof)
:($(QuoteNode(Core.sizeof)))
julia> @btime runifbuiltin(qf, qx)
96.828 ns (1 allocation: 16 bytes)
Some(8)
julia> @btime runifintrinsic(qf, qx)
14.321 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))
julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(Core.sizeof)))($(QuoteNode(3))))
julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
30.768 ns (1 allocation: 16 bytes)
Some(8)
julia> qf = QuoteNode(Base.neg_int)
:($(QuoteNode(neg_int)))
julia> @btime runifbuiltin(qf, qx)
113.218 ns (1 allocation: 16 bytes)
Some(-3)
julia> @btime runifintrinsic(qf, qx)
56.686 ns (1 allocation: 16 bytes)
Some(-3)
julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(neg_int)))($(QuoteNode(3))))
julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
45.757 ns (1 allocation: 16 bytes)
Some(-3) I've also tried a middle ground, checking |
Good call! 900s->590s. |
I updated the UI tests (and their dependencies) so tests now pass on 1.1 and nightly. 1.0 fails due to the already mentioned |
Project.toml
Outdated
|
||
[targets] | ||
test = ["Test", "TerminalRegressionTests", "VT100"] | ||
build = ["InteractiveUtils"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no build
target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommended approach? Just make it a dependency of the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
OK, I've modified this to generate the file at build time. I expect tests to pass. Before merging, I think it would be best to wait a bit in case anyone wants to give this a detailed review. There are also a couple of issues to discuss. First, I recognize the excitement to get debugging working in Juno again. But I'd urge a bit of patience to tackle a couple of things that should be done first:
The point being that I don't think it would be great to reintroduce a stepping-debugger to the Julia world and then break it only a couple of days later. |
Hm, it seems test only dependencies that are using a checked out branch in the Manifest aren't too happy when they get put in the |
Also one technical issue to highlight: before merging, perhaps I should experiment with moving the local method tables from the |
Surprisingly, moving the local method tables led to a 15% slowdown. So I say let's leave these as they are. So the only barriers I know of to merging are (1) any review comments, and (2) fixing the Pkg issues. Then after merging we can discuss splitting the package. |
To ensure that this package can be hacked on by more than a small handful of people, I've added a bunch of docstrings and some Documenter narrative, including a somewhat gentle introduction to Julia's lowered AST representation. I haven't implemented deployment yet since I think it would be better to do that once we're basically ready to release. But you can view the docs locally by navigating to the |
This also necessitates changing how Builtins are handled, but this is more robust than the previous module/symbol check.
This is needed for debugging
The generator can in principle return arbitrary expressions depending on the exact type.
This also makes minor code cleanups.
At least module-scope code can use Ints for :gotoifnot expressions.
Because we run the optimizer we don't want to contaminate the original.
Separates out the `JuliaFrameCode` constructor, generalizes `moduleof`, adds pc utilities
fixup some toml files and .travis
This is a pretty big overhaul of this package. I've tried to keep its original functionality, but perhaps a split should be the next step (see #32). The first 5 commits get the tests passing, with the exception of the ui tests which I didn't even look at (CC @staticfloat re #34, #36; @pfitzseb re #33).
The rest is much more ambitious. I worked a lot on the performance, since an "easy" way to add robust breakpoints, etc, is via running your code in the interpreter. And of course there's interest in using the interpreter to circumvent compile-time cost. But then performance matters.
The most important changes here are focused on reducing the cost of dynamic dispatch (or its equivalent here, "dynamic lowered code lookup"). Some highlights:
tfunc
data inCore.Inference
to auto-generate an evaluator that resolve all calls with fixed number of arguments. A straightforward extension would be resolve all calls with bounded numbers of arguments (e.g., those that have between 2 and 4 arguments).which
, this adds "local method tables," one per :call Expr. These are cached by exact type, sinceisa(x, Int)
is fast butisa(x, Integer)
is slow. So it usesMethodInstance
comparisons rather thanMethod
signature comparisons, even though it might look up the same lowered code. I think a slightly more elegant way to do this would be to add a new type,to the list of valid types in a
CodeInfo
. (I did it this way at first but it breaks things like basic-block computation and ssa-usage analysis. So I resorted to storing this info in a separate fields.)For a simple summation test, I'm getting about 15us per iteration. Compiled code is about 5ns, so this is still dirt-slow. But just getting it to this point was quite a major overhaul.
CC @JeffBezanson, @vtjnash, @StefanKarpinski.