-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable/improve constant propagation through varargs methods #26826
Conversation
base/compiler/inferenceresult.jl
Outdated
if isvarargtype(va) | ||
# assumes that we should never see Vararg{T, x}, where x is a constant (should be guaranteed by construction) | ||
va = rewrap_unionall(va, linfo.specTypes) | ||
vararg_type_vec = Any[va] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This vec type is wrong - there’s basically nothing we can do in this case (just keep the empty Tuple)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, looks like the inference code will handle this, so this is OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to make sure I follow, you're saying that we can't do anything if we don't actually have the types of the "trailing" arguments in atypes
, so this should just be:
if nargs > laty
vararg_type_vec = Any[]
vararg_type = Tuple{}
else
⋮ # the stuff that's already here, but fixed wrt to your other comment
end
EDIT: oops, I made this comment before refreshing the page, so I didn't see your other comment
base/compiler/inferenceresult.jl
Outdated
cache_match = true | ||
# verify that the trailing args (va) aren't Const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to verify that the vargs list matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verifying this correctly would essentially require fully recomputing the vararg_type_vec
part of get_argtypes
, right?
Should I actually do that, or is there a more lightweight check I should be doing?
EDIT: This would also be necessary for checking the last element of cache_args
, which is the vararg_type
that we computed in get_argtypes
.
base/compiler/inferenceresult.jl
Outdated
end | ||
cache_match || continue | ||
for i in 1:nargs | ||
for i in 1:length(argtypes) | ||
a = argtypes[i] | ||
ca = cache_args[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nargs
items long
base/compiler/inferenceresult.jl
Outdated
for cache_code in cache | ||
# try to search cache first | ||
cache_args = cache_code.args | ||
if cache_code.linfo === code && length(cache_args) >= nargs | ||
if cache_code.linfo === code && length(cache_args) == length(argtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length(cache_args) === nargs
This change would filter out any varargs method from being found in the cache (slow, but not otherwise broken)
base/compiler/inferenceresult.jl
Outdated
vararg_type_vec = Any[rewrap_unionall(p, linfo.specTypes) for p in atypes[nargs:laty]] | ||
vararg_type = tuple_tfunc(Tuple{vararg_type_vec...}) | ||
for i in 1:length(vararg_type_vec) | ||
atyp = vararg_type_vec[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this seems wrong too -atypes
can be a Varargs here, and I don’t recall seeing that being checked for in the downstream users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just talked to Jameson in person, turns out this is actually okay)
end | ||
end | ||
end | ||
result.vargs = vararg_type_vec | ||
end | ||
args[nargs] = vararg_type | ||
nargs -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this got put back, aren’t we missing detection of the Varargs argument as Const then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, wait, no. I think tuple_tfunc took care of this
BSD CI got unkillable process from this PR, and it ate 100% CPU on a core. and this
|
415b59f
to
5572926
Compare
5572926
to
d7a79bf
Compare
Looks like this passes all tests except for |
Okay, I've tracked down the offending test here: julia/test/compiler/compiler.jl Line 1248 in 1c5ed70
Interrupting that test while it hangs yields:
|
c4e106e
to
efc42a4
Compare
Woohoo, tests here now pass locally. |
Only things left here now are adding tests and investigating the potential cache problems pointed out at JuliaLabs/Cassette.jl#41 (comment) |
d863bba
to
3424b31
Compare
Okay, tests added and caching problem fixed (thanks to @vtjnash). I rebased and squashed so that this doesn't leave a bunch of intermediate broken commits. Only thing left now is to make sure CI passes and then this is good to go. |
From running tests locally, it looks like everything passes except a number of tests in EDIT: Here's a gist of the failures logged by the test suite: https://gist.github.com/jrevels/197ed8b8bf71d72acd8bebe246116655 EDIT 2: ...and here's a gist that reproduces the failures in a more digestable format (ripped from the failing test code): https://gist.github.com/jrevels/20f5e1d7ac64c0b51099960ded3e4687 |
Good luck. I've lost many hours to failures in there. Typically it's been due to a failure to inline. Check |
So, on master (see https://gist.github.com/jrevels/20f5e1d7ac64c0b51099960ded3e4687 for test setup): julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test Failed at REPL[10]:1
Expression: #= REPL[10]:1 =# @allocated(broadcast!(f, Q, X, Y, Z)) == 0
Evaluated: 16 == 0
ERROR: There was an error during testing On this PR currently: julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Failed at REPL[8]:1
Expression: #= REPL[8]:1 =# @allocated(broadcast!(+, Q, X, Y, Z)) == 0
Evaluated: 6688 == 0
ERROR: There was an error during testing
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Failed at REPL[9]:1
Expression: #= REPL[9]:1 =# @allocated(broadcast!(*, Q, X, Y, Z)) == 0
Evaluated: 6688 == 0
ERROR: There was an error during testing
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test Failed at REPL[10]:1
Expression: #= REPL[10]:1 =# @allocated(broadcast!(f, Q, X, Y, Z)) == 0
Evaluated: 1744 == 0
ERROR: There was an error during testing EDIT: I was dumb, see comment below |
D'oh, I was dumb and didn't rebuild the sysimg. It turns out the culprit is entirely the change to inlineable that reduced InferenceResult cache misses, which makes sense. Here's this PR on the MWE tests with the offending change reverted: julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test Passed So I just need to figure out how to fix those cache misses without affecting inlining... |
3424b31
to
26911c1
Compare
Okay, I just pushed a rewrite of the cache lookup fix which passes both the new compiler tests and the MWE above. @vtjnash points out that the previous cache lookup fix also fixed another problem that is currently on master: the I'm not going to consider fixing this issue necessary for merging this PR, since the issue already exists on master, but it might be worth looking into later. |
26911c1
to
1a9f8e8
Compare
Okay, tests all pass locally now. This PR is basically done, so I'm going to merge once CI turns up green unless there are any objections. |
Worth running Nanosoldier? |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
base/compiler/inferenceresult.jl
Outdated
@@ -5,6 +5,7 @@ const EMPTY_VECTOR = Vector{Any}() | |||
mutable struct InferenceResult | |||
linfo::MethodInstance | |||
args::Vector{Any} | |||
vargs::Vector{Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field could use an explanatory comment.
- Store varargs type information in the InferenceResult object, such that the info can be used during inference/optimization - Hack in a more precise return type for getfield of a vararg tuple. Ideally, we would handle this by teaching inference to track the types of the individual fields of a Tuple, which would make this unnecessary, but until then, this hack is helpful. - Spoof parents as well as children during recursion limiting, so that higher degree cycles are appropriately spoofed - A broadcast test marked as broken is now no longer broken, presumably due to the optimizations in this commit - Fix relationship between depth/mindepth in limit_type_size/is_derived_type. The relationship should have been inverse over the domain in which they overlap, but was not maintained consistently. An example of problematic case was: t = Tuple{X,X} where X<:Tuple{Tuple{Int64,Vararg{Int64,N} where N},Tuple{Int64,Vararg{Int64,N} where N}} c = Tuple{X,X} where X<:Tuple{Int64,Vararg{Int64,N} where N} because is_derived_type was computing the depth of usage rather than the depth of definition. This change thus makes the depth/mindepth calculations more consistent, and causes the limiting heuristic to return strictly wider types than it did before. - Move the optimizer's "varargs types to tuple type" rewrite to after cache lookup.Inference is populating the InferenceResult cache using the varargs form, so the optimizer needs to do the lookup before writing the atypes in order to avoid cache misses. Co-authored-by: Jameson Nash <vtjnash@users.noreply.github.com> Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>
1a9f8e8
to
b4b4d21
Compare
Added the requested comment. Investigated the benchmark regressions, none seemed significant. Even the 3.65x one showed no regression locally, and had the same CI was totally green (besides unrelated Windows timeout) before I added the requested comment, so I'll go ahead and merge (I did rebuild and run tests locally just be absolutely sure, but of course adding a comment didn't break anything). |
* fix InferenceResult cache lookup for new optimizer * utilize cached vararg type info in new inlining pass * fix test to work with new optimizer * fix predicate for exploiting cached varargs type info * atypes no longer needs to be modified to match cached types * don't require varargs to be in last position to exploit cached type info * restore missing isva boolean
- inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but I think `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but I think `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in JuliaLang#26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in JuliaLang#26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
This should allow constant propagation through varargs functions. I was just a keyboard monkey following @Keno's/@vtjnash's direction for this PR, so they can probably answer any questions better than I can.
EDIT: We've now added some other stuff that expands the scope here a bit (mainly so we don't forget some of these hacks), but this PR can be broken up into several others later if need be