inference: refactor the core loops to use less memory #45276

aviatesk · 2022-05-11T09:56:55Z

A minimum version of #43999, without any apparent regressions.

The original PR description by @Keno will follow:

Currently inference uses O(<number of statements>*<number of slots>) state
in the core inference loop. This is usually fine, because users don't tend
to write functions that are particularly long. However, MTK does generate
functions that are excessively long and we've observed MTK models that spend
99% of their inference time just allocating and copying this state.
It is possible to get away with significantly smaller state, and this PR is
a first step in that direction, reducing the state to O(<number of basic blocks>*<number of slots>).
Further improvements are possible by making use of slot liveness information
and only storing those slots that are live across a particular basic block.

The core change here is to keep a full set of slottypes only at
basic block boundaries rather than at each statement. For statements
in between, the full variable state can be fully recovered by
linearly scanning throughout the basic block, taking note of
slot assignments (together with the SSA type) and NewVarNodes.

The current status of this branch is that the changes appear correct
(no known functional regressions) and significantly improve the MTK
test cases in question (no exact benchmarks here for now, since
the branch still needs a number of fixes before final numbers make
sense), but somewhat regress optimizer quality (which is expected
and just a missing TODO) and bootstrap time (which is not expected
and something I need to dig into).

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2022-05-11T15:46:48Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2022-05-11T15:49:46Z

@nanosoldier runtests(ALL, vs = ":master")

aviatesk · 2022-05-11T15:51:31Z

@nanosoldier runbenchmarks("broadcast" || "collection", vs=":master")

nanosoldier · 2022-05-11T16:32:25Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier · 2022-05-12T05:32:47Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-05-12T16:14:58Z

@nanosoldier runtests(["LoopVectorization", "RecursiveFactorization"], vs = ":master")
@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2022-05-12T17:35:51Z

Your package evaluation job has completed - no new issues were detected. A full report can be found here.

ianatol

LGTM! Good work Shuhei!

Edit: Just saw PkgEval results, I guess there is more work to be done. Let me know if you want help, as I read it somewhat thoroughly to do this review (though obviously not thoroughly enough to catch any bugs 😂 )

base/compiler/abstractinterpretation.jl

nanosoldier · 2022-05-12T22:07:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2022-05-13T00:49:57Z

@nanosoldier runtests(["LoopVectorization", "RecursiveFactorization"], vs = ":master")

nanosoldier · 2022-05-13T01:24:36Z

Your package evaluation job has completed - no new issues were detected. A full report can be found here.

aviatesk · 2022-05-13T01:42:45Z

@nanosoldier runtests(ALL, vs = ":master")
@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-05-13T02:14:35Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier · 2022-05-13T08:51:59Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-05-18T08:32:13Z

@nanosoldier runtests(ALL, vs = ":master")
@nanosoldier runbenchmarks(!"scalar", vs=":master")

aviatesk · 2022-05-24T10:55:19Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-05-24T11:15:35Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo -n /nanosoldier/cset/bin/cset shield -e -- sudo -n -u nanosoldier-worker -- /nanosoldier/workdir/jl_P8M1Ku/benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here

aviatesk · 2022-05-25T10:41:03Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-05-25T11:12:25Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Currently inference uses `O(<number of statements>*<number of slots>)` state in the core inference loop. This is usually fine, because users don't tend to write functions that are particularly long. However, MTK does generate functions that are excessively long and we've observed MTK models that spend 99% of their inference time just allocating and copying this state. It is possible to get away with significantly smaller state, and this PR is a first step in that direction, reducing the state to `O(<number of basic blocks>*<number of slots>)`. Further improvements are possible by making use of slot liveness information and only storing those slots that are live across a particular basic block. The core change here is to keep a full set of `slottypes` only at basic block boundaries rather than at each statement. For statements in between, the full variable state can be fully recovered by linearly scanning throughout the basic block, taking note of slot assignments (together with the SSA type) and NewVarNodes. The current status of this branch is that the changes appear correct (no known functional regressions) and significantly improve the MTK test cases in question (no exact benchmarks here for now, since the branch still needs a number of fixes before final numbers make sense), but somewhat regress optimizer quality (which is expected and just a missing TODO) and bootstrap time (which is not expected and something I need to dig into). Co-Authored-By: Keno Fisher <keno@juliacomputing.com>

aviatesk · 2022-05-30T02:19:17Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-05-30T02:50:32Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2022-05-30T02:56:19Z

The nanosolidier results look nice. Going to merge.

oscardssmith added the compiler:inference Type inference label May 11, 2022

aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from e0074a2 to 9c11c71 Compare May 12, 2022 16:14

ianatol reviewed May 12, 2022

View reviewed changes

base/compiler/abstractinterpretation.jl Outdated Show resolved Hide resolved

aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from 406cb61 to 2336683 Compare May 13, 2022 00:31

aviatesk force-pushed the avi/inferencerefactor branch from 2336683 to c642a71 Compare May 13, 2022 01:39

aviatesk force-pushed the avi/inferencerefactor branch 5 times, most recently from d6a4417 to 87d863c Compare May 17, 2022 09:51

This comment was marked as resolved.

Sign in to view

aviatesk changed the title ~~inference: refactor the core loops to use less memory~~ wip: inference: refactor the core loops to use less memory May 17, 2022

aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from 72c03d4 to c245fb8 Compare May 18, 2022 08:30

aviatesk force-pushed the avi/inferencerefactor branch from c67e37d to 283aee5 Compare May 24, 2022 09:09

aviatesk force-pushed the avi/inferencerefactor branch from cde6129 to eb6d7ca Compare May 27, 2022 07:33

aviatesk and others added 16 commits May 30, 2022 11:18

re-introduce TypedSlot

8b58c3d

fix regressions in used-undef marking

8eb841c

optimize a bit

365ba66

optimize a bit further

3435017

minimize diff

a039616

use less mutable data in the core loop

0d4ba38

fix!!!

4904624

optimize more

75ee60c

optimize

a6ff873

type stabilize

5ee3909

give more explicit meaning for stupdate! function

ef59e79

refactor absint state management utilities

46f01d2

rm unnecessary @inline annotation

98e917b

optimize type_annotate! further

339ce44

some very minor optimizations

973ff33

aviatesk force-pushed the avi/inferencerefactor branch from eb6d7ca to 973ff33 Compare May 30, 2022 02:18

aviatesk merged commit 5a32626 into master May 30, 2022

aviatesk deleted the avi/inferencerefactor branch May 30, 2022 04:40

aviatesk mentioned this pull request May 30, 2022

Very WIP: Refactor core inference loops to use less memory #43999

Closed

aviatesk added a commit that referenced this pull request May 31, 2022

add news entry for #45276 and #45404

0bb78e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference: refactor the core loops to use less memory #45276

inference: refactor the core loops to use less memory #45276

aviatesk commented May 11, 2022

nanosoldier commented May 11, 2022

aviatesk commented May 11, 2022

aviatesk commented May 11, 2022

nanosoldier commented May 11, 2022

nanosoldier commented May 12, 2022

aviatesk commented May 12, 2022

nanosoldier commented May 12, 2022

ianatol left a comment •

edited

Loading

nanosoldier commented May 12, 2022

aviatesk commented May 13, 2022

nanosoldier commented May 13, 2022

aviatesk commented May 13, 2022

nanosoldier commented May 13, 2022

nanosoldier commented May 13, 2022

This comment was marked as resolved.

aviatesk commented May 18, 2022

aviatesk commented May 24, 2022

nanosoldier commented May 24, 2022

aviatesk commented May 25, 2022

nanosoldier commented May 25, 2022

aviatesk commented May 30, 2022

nanosoldier commented May 30, 2022

aviatesk commented May 30, 2022

inference: refactor the core loops to use less memory #45276

inference: refactor the core loops to use less memory #45276

Conversation

aviatesk commented May 11, 2022

nanosoldier commented May 11, 2022

aviatesk commented May 11, 2022

aviatesk commented May 11, 2022

nanosoldier commented May 11, 2022

nanosoldier commented May 12, 2022

aviatesk commented May 12, 2022

nanosoldier commented May 12, 2022

ianatol left a comment • edited Loading

Choose a reason for hiding this comment

nanosoldier commented May 12, 2022

aviatesk commented May 13, 2022

nanosoldier commented May 13, 2022

aviatesk commented May 13, 2022

nanosoldier commented May 13, 2022

nanosoldier commented May 13, 2022

This comment was marked as resolved.

aviatesk commented May 18, 2022

aviatesk commented May 24, 2022

nanosoldier commented May 24, 2022

aviatesk commented May 25, 2022

nanosoldier commented May 25, 2022

aviatesk commented May 30, 2022

nanosoldier commented May 30, 2022

aviatesk commented May 30, 2022

ianatol left a comment •

edited

Loading