New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Propagate iteration info to optimizer #36684

Merged

Keno merged 1 commit into master from kf/splat2

Jul 18, 2020

Member

Keno commented Jul 15, 2020

This supersedes #36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)

Before:

julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405

After:

julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

I also implemented the TODO, I had left in #36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Fixes #36087
Fixes #29114

Keno requested review from vtjnash, JeffBezanson and martinholters

July 15, 2020 22:48

Keno mentioned this pull request

Inline singleton splats #36169

Closed

Keno force-pushed the kf/splat2 branch from 67bede4 to 4767755 Compare

July 15, 2020 22:54

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

    
            @@ -596,49 +596,74 @@ function spec_lambda(@nospecialize(atype), sv::OptimizationState, @nospecialize(
          
              end

              # This assumes the caller has verified that all arguments to the _apply call are Tuples.

              function rewrite_apply_exprargs!(ir::IRCode, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arg_start::Int)

              function rewrite_apply_exprargs!(ir::IRCode, todo, idx::Int, argexprs::Vector{Any}, atypes::Vector{Any}, arginfos::Vector{Any}, arg_start::Int, sv)

Sponsor Member

vtjnash Jul 17, 2020

type decls

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

+                      else
+                          state = Core.svec()
+                          for i = 1:length(thisarginfo.each)
+                              mthd = thisarginfo.each[i]

Sponsor Member

vtjnash Jul 17, 2020

Suggested change

      
                            mthd = thisarginfo.each[i]
          
                            meth = thisarginfo.each[i]

Don't think we've ever used this mangling before

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

@@ @@ -876,9 +901,22 @@ function call_sig(ir::IRCode, stmt::Expr) @@
                   Signature(f, ft, atypes)
               end
-              function inline_apply!(ir::IRCode, idx::Int, sig::Signature, params::OptimizationParams)
+              function inline_apply!(ir::IRCode, todo, idx::Int, sig::Signature, params::OptimizationParams, sv)

Sponsor Member

vtjnash Jul 17, 2020

type decls

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

+                              if i != length(thisarginfo.each)
+                                  valT = getfield_tfunc(T, Const(1))
+                                  val_extracted = insert_node!(ir, idx, valT,
+                                      Expr(:call, Core.getfield, state1, 1))

Sponsor Member

vtjnash Jul 17, 2020

Suggested change

      
                                    Expr(:call, Core.getfield, state1, 1))
          
                                    Expr(:call, GlobalRef(Core, :getfield), state1, 1))

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl

@@ @@ -945,7 +990,7 @@ end @@
               # Handles all analysis and inlining of intrinsics and builtins. In particular,
               # this method does not access the method table or otherwise process generic
               # functions.
-              function process_simple!(ir::IRCode, idx::Int, params::OptimizationParams, world::UInt)
+              function process_simple!(ir::IRCode, todo, idx::Int, params::OptimizationParams, world::UInt, sv)

Sponsor Member

vtjnash Jul 17, 2020

type decls

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

@@ @@ -1010,13 +1055,95 @@ function recompute_method_matches(atype, sv) @@
                   MethodMatchInfo(meth, ambig)
               end
+              function analyze_single_call!(ir, todo, idx, stmt, sig, calltype, infos, sv)

Sponsor Member

vtjnash Jul 17, 2020

type decls

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl

-                          meth = info.applicable
-                          if meth === false || info.ambig
-                              # Too many applicable methods
-                              # Or there is a (partial?) ambiguity

Sponsor Member

vtjnash Jul 17, 2020 •

edited

Loading

What happened to this? (the comment specifically, the rest I know just moved)

Member Author

Keno Jul 17, 2020

Looks like it got lost in rebase. Will fix.

vtjnash reviewed

View reviewed changes

base/compiler/ssair/inlining.jl Outdated

+                      end
+                      for match in meth::Vector{Any}
+                          (metharg, methsp, method) = (match[1]::Type, match[2]::SimpleVector, match[3]::Method)
+                          # TODO: This could be better

Sponsor Member

vtjnash Jul 17, 2020

comment should say how

base/compiler/ssair/inlining.jl

               function assemble_inline_todo!(ir::IRCode, sv::OptimizationState)
                   # todo = (inline_idx, (isva, isinvoke, na), method, spvals, inline_linetable, inline_ir, lie)
                   todo = Any[]
                   skip = find_throw_blocks(ir.stmts.inst, RefValue(ir))
                   for idx in 1:length(ir.stmts)
                       idx in skip && continue
-                      r = process_simple!(ir, idx, sv.params, sv.world)
+                      r = process_simple!(ir, todo, idx, sv.params, sv.world, sv)
                       r === nothing && continue
                       stmt = ir.stmts[idx][:inst]

Sponsor Member

vtjnash Jul 17, 2020

In theory, the purpose of ir.stmts[idx] existing as a type now is so that we don't need to pass idx, stmt, type, and so on as separate arguments in new code

base/compiler/ssair/inlining.jl

+                  end
+                  length(cases) == 0 && return
+                  push!(todo, UnionSplit(idx, fully_covered, sig.atype, cases))
+              end

Sponsor Member

vtjnash Jul 17, 2020

Suggested change

vtjnash reviewed

View reviewed changes

base/compiler/stmtinfo.jl

                   matches::Vector{MethodMatchInfo}
               end
+              struct AbstractIterationInfo

Sponsor Member

vtjnash Jul 17, 2020

Can you add docs describing when each of these is legal to appear as the stmtinfo and what it means elsewhere to discover them on a statement?

vtjnash approved these changes

View reviewed changes

Sponsor Member

vtjnash left a comment

LGTM, with some minor nits

Keno force-pushed the kf/splat2 branch from 4767755 to 0a0bff7 Compare

July 17, 2020 21:26


          Propagate iteration info to optimizer

984c504

This supersedes #36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

```
f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)
```

Before:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405
```

After:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

I also implemented the TODO, I had left in #36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

```
julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

Fixes #36087
Fixes #29114

Keno force-pushed the kf/splat2 branch from 0a0bff7 to 984c504 Compare

July 17, 2020 21:27

Member Author

Keno commented Jul 17, 2020

Note to self that we can revert #29060 when this merges.

Keno merged commit 435bf88 into master

Keno deleted the kf/splat2 branch

July 18, 2020 23:35

Member

c42f commented Jul 21, 2020

Well I didn't get time to look at this in any detail, but absent any meaningful technical contribution I thought I'd leave a note of appreciation instead: thanks Keno for fixing this!

Although we toyed with other ways to fix this in #29114, improving the optimizer does seem like the right thing.

antoine-levitt mentioned this pull request

Performance of splatting a SVector JuliaArrays/StaticArrays.jl#361

Closed

yhls added a commit to yhls/julia that referenced this pull request


          Fix small bug in JuliaLang#36684

f638a1a

PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

yhls mentioned this pull request

updating domtrees dynamically, removing all unreachable blocks #33730

Closed

yhls added a commit to yhls/julia that referenced this pull request


          Fix small bug in JuliaLang#36684

9fd6420

PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request


          Propagate iteration info to optimizer (JuliaLang#36684)

d5c07f8

This supersedes JuliaLang#36169. Rather than re-implementing the iteration
analysis as done there, this uses the new stmtinfo infrastrcture
to propagate all the analysis done during inference all the way
to inlining. As a result, it applies not only to splats of
singletons, but also to splats of any other short iterable
that inference can analyze. E.g.:

```
f(x) = (x...,)
@code_typed f(1=>2)
@benchmark f(1=>2)
```

Before:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Core._apply_iterate(Base.iterate, Core.tuple, x)::Tuple{Int64,Int64}
└──      return %1
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  3
  --------------
  minimum time:     242.659 ns (0.00% GC)
  median time:      246.904 ns (0.00% GC)
  mean time:        255.390 ns (1.08% GC)
  maximum time:     4.415 μs (93.94% GC)
  --------------
  samples:          10000
  evals/sample:     405
```

After:
```
julia> @code_typed f(1=>2)
CodeInfo(
1 ─ %1 = Base.getfield(x, 1)::Int64
│   %2 = Base.getfield(x, 2)::Int64
│   %3 = Core.tuple(%1, %2)::Tuple{Int64,Int64}
└──      return %3
) => Tuple{Int64,Int64}

julia> @benchmark f(1=>2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.701 ns (0.00% GC)
  median time:      1.925 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     6.941 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

I also implemented the TODO, I had left in JuliaLang#36169 to inline
the iterate calls themselves, which gives another 3x
improvement over the solution in that PR:

```
julia> @code_typed f(1)
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Int64}
└──      return %1
) => Tuple{Int64}

julia> @benchmark f(1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.696 ns (0.00% GC)
  median time:      1.699 ns (0.00% GC)
  mean time:        1.702 ns (0.00% GC)
  maximum time:     5.389 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
```

Fixes JuliaLang#36087
Fixes JuliaLang#29114

vchuravy pushed a commit that referenced this pull request


          Fix small bug in #36684

95e9f52

PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

vchuravy pushed a commit that referenced this pull request


          Fix small bug in #36684

af4cf31

PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request


          Fix small bug in JuliaLang#36684

6b2509b

PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

c42f mentioned this pull request

bounds error on empty splat #37555

Closed

vchuravy pushed a commit that referenced this pull request


          Fix small bug in #36684

a31cc9a

PR #36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet