Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate iteration info to optimizer #36684

Merged
merged 1 commit into from
Jul 18, 2020
Merged

Propagate iteration info to optimizer #36684

merged 1 commit into from
Jul 18, 2020

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 15, 2020

This supersedes #36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)

Before:

julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405

After:

julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

I also implemented the TODO, I had left in #36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Fixes #36087
Fixes #29114

@@ -596,49 +596,74 @@ function spec_lambda(@nospecialize(atype), sv::OptimizationState, @nospecialize(
end

# This assumes the caller has verified that all arguments to the _apply call are Tuples.
function rewrite_apply_exprargs!(ir::IRCode, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arg_start::Int)
function rewrite_apply_exprargs!(ir::IRCode, todo, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arginfos::Vector{Any}, arg_start::Int, sv)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type decls

else
state = Core.svec()
for i = 1:length(thisarginfo.each)
mthd = thisarginfo.each[i]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mthd = thisarginfo.each[i]
meth = thisarginfo.each[i]

Don't think we've ever used this mangling before

@@ -876,9 +901,22 @@ function call_sig(ir::IRCode, stmt::Expr)
Signature(f, ft, atypes)
end

function inline_apply!(ir::IRCode, idx::Int, sig::Signature, params::OptimizationParams)
function inline_apply!(ir::IRCode, todo, idx::Int, sig::Signature, params::OptimizationParams, sv)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type decls

if i != length(thisarginfo.each)
valT = getfield_tfunc(T, Const(1))
val_extracted = insert_node!(ir, idx, valT,
Expr(:call, Core.getfield, state1, 1))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Expr(:call, Core.getfield, state1, 1))
Expr(:call, GlobalRef(Core, :getfield), state1, 1))

@@ -945,7 +990,7 @@ end
# Handles all analysis and inlining of intrinsics and builtins. In particular,
# this method does not access the method table or otherwise process generic
# functions.
function process_simple!(ir::IRCode, idx::Int, params::OptimizationParams, world::UInt)
function process_simple!(ir::IRCode, todo, idx::Int, params::OptimizationParams, world::UInt, sv)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type decls

@@ -1010,13 +1055,95 @@ function recompute_method_matches(atype, sv)
MethodMatchInfo(meth, ambig)
end

function analyze_single_call!(ir, todo, idx, stmt, sig, calltype, infos, sv)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type decls

meth = info.applicable
if meth === false || info.ambig
# Too many applicable methods
# Or there is a (partial?) ambiguity
Copy link
Sponsor Member

@vtjnash vtjnash Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to this? (the comment specifically, the rest I know just moved)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it got lost in rebase. Will fix.

end
for match in meth::Vector{Any}
(metharg, methsp, method) = (match[1]::Type, match[2]::SimpleVector, match[3]::Method)
# TODO: This could be better
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment should say how

function assemble_inline_todo!(ir::IRCode, sv::OptimizationState)
# todo = (inline_idx, (isva, isinvoke, na), method, spvals, inline_linetable, inline_ir, lie)
todo = Any[]
skip = find_throw_blocks(ir.stmts.inst, RefValue(ir))
for idx in 1:length(ir.stmts)
idx in skip && continue
r = process_simple!(ir, idx, sv.params, sv.world)
r = process_simple!(ir, todo, idx, sv.params, sv.world, sv)
r === nothing && continue

stmt = ir.stmts[idx][:inst]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the purpose of ir.stmts[idx] existing as a type now is so that we don't need to pass idx, stmt, type, and so on as separate arguments in new code

end
length(cases) == 0 && return
push!(todo, UnionSplit(idx, fully_covered, sig.atype, cases))
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end
nothing
end

matches::Vector{MethodMatchInfo}
end

struct AbstractIterationInfo
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add docs describing when each of these is legal to appear as the stmtinfo and what it means elsewhere to discover them on a statement?

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some minor nits

This supersedes #36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

```
f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)
```

Before:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405
```

After:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

I also implemented the TODO, I had left in #36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

```
julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

Fixes #36087
Fixes #29114
@Keno
Copy link
Member Author

Keno commented Jul 17, 2020

Note to self that we can revert #29060 when this merges.

@Keno Keno merged commit 435bf88 into master Jul 18, 2020
@Keno Keno deleted the kf/splat2 branch July 18, 2020 23:35
@c42f
Copy link
Member

c42f commented Jul 21, 2020

Well I didn't get time to look at this in any detail, but absent any meaningful technical contribution I thought I'd leave a note of appreciation instead: thanks Keno for fixing this!

Although we toyed with other ways to fix this in #29114, improving the optimizer does seem like the right thing.

yhls added a commit to yhls/julia that referenced this pull request Aug 3, 2020
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
yhls added a commit to yhls/julia that referenced this pull request Aug 3, 2020
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
This supersedes JuliaLang#36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

```
f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)
```

Before:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405
```

After:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

I also implemented the TODO, I had left in JuliaLang#36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

```
julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

Fixes JuliaLang#36087
Fixes JuliaLang#29114
vchuravy pushed a commit that referenced this pull request Aug 21, 2020
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
vchuravy pushed a commit that referenced this pull request Aug 21, 2020
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 29, 2020
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
vchuravy pushed a commit that referenced this pull request Oct 5, 2020
PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Splat of Integer needs to be special cased Performance of splatting a number
3 participants