Very WIP: Refactor core inference loops to use less memory #43999

Keno · 2022-02-01T01:57:44Z

Currently inference uses O(<number of statements>*<number of slots>) state
in the core inference loop. This is usually fine, because users don't tend
to write functions that are particularly long. However, MTK does generate
functions that are excessively long and we've observed MTK models that spend
99% of their inference time just allocating and copying this state.
It is possible to get away with significantly smaller state, and this PR is
a first step in that direction, reducing the state to O(<number of basic blocks>*<number of slots>).
Further improvements are possible by making use of slot liveness information
and only storing those slots that are live across a particular basic block.

The core change here is to keep a full set of slottypes only at
basic block boundaries rather than at each statement. For statements
in between, the full variable state can be fully recovered by
linearly scanning throught the basic block, taking note of
slot assignments (together with the SSA type) and NewVarNodes.

The current status of this branch is that the changes appear correct
(no known functional regressions) and significantly improve the MTK
test cases in question (no exact benchmarks here for now, since
the branch still needs a number of fixes before final numbers make
sense), but somewhat regress optimizer quality (which is expected
and just a missing TODO) and bootstrap time (which is not expected
and something I need to dig into).

aviatesk · 2022-04-21T10:11:20Z

Rebased against the latest master, and dropped changes from #43899 (I believe these two PRs are orthogonal and can be merged separately). Still there seems to be some problems in this PR. Will try to dig into them.

Currently inference uses `O(<number of statements>*<number of slots>)` state in the core inference loop. This is usually fine, because users don't tend to write functions that are particularly long. However, MTK does generate functions that are excessively long and we've observed MTK models that spend 99% of their inference time just allocating and copying this state. It is possible to get away with significantly smaller state, and this PR is a first step in that direction, reducing the state to `O(<number of basic blocks>*<number of slots>)`. Further improvements are possible by making use of slot liveness information and only storing those slots that are live across a particular basic block. The core change here is to keep a full set of `slottypes` only at basic block boundaries rather than at each statement. For statements in between, the full variable state can be fully recovered by linearly scanning throughout the basic block, taking note of slot assignments (together with the SSA type) and NewVarNodes. The current status of this branch is that the changes appear correct (no known functional regressions) and significantly improve the MTK test cases in question (no exact benchmarks here for now, since the branch still needs a number of fixes before final numbers make sense), but somewhat regress optimizer quality (which is expected and just a missing TODO) and bootstrap time (which is not expected and something I need to dig into).

aviatesk · 2022-04-26T12:10:30Z

base/compiler/abstractinterpretation.jl

+        if s.undef
+            sv.src.slotflags[sn] |= SLOT_USEDUNDEF | SLOT_STATICUNDEF
+        end


After digging into the regressions in the optimizer, I found this change could be problematic, and especially causes some regressions in the SROA pass.

I think these changes are orthogonal to the main purpose of this PR, so I am trying to fix regressions by separating related changes into another branch.

Still there seems to be another cause of the regressions though.

Separated from #43999. xref: <#43999 (comment)>

Separated from JuliaLang#43999. xref: <JuliaLang#43999 (comment)>

aviatesk · 2022-05-30T05:05:07Z

Replaced by #45276 .

Keno force-pushed the kf/effects2 branch from 5548d42 to 93eb06c Compare February 10, 2022 02:22

Keno mentioned this pull request Feb 24, 2022

Excessive inference time for functions with many kwargs #44322

Closed

Keno mentioned this pull request Mar 17, 2022

WIP: Semi-concrete IR interpreter #44660

Closed

Keno force-pushed the kf/effects2 branch from 7681576 to a517069 Compare March 21, 2022 22:22

aviatesk self-assigned this Apr 21, 2022

aviatesk force-pushed the kf/inferencerefactor branch from 9ae2a51 to 36e3abe Compare April 21, 2022 10:09

aviatesk changed the base branch from kf/effects2 to master April 21, 2022 10:10

aviatesk force-pushed the kf/inferencerefactor branch 4 times, most recently from 2ec47da to 60a82a6 Compare April 25, 2022 13:31

aviatesk force-pushed the kf/inferencerefactor branch from 60a82a6 to dfeb2e9 Compare April 26, 2022 07:43

aviatesk reviewed Apr 26, 2022

View reviewed changes

aviatesk added a commit that referenced this pull request Apr 26, 2022

refactor unreachability analysis

ec8b321

Separated from #43999. xref: <#43999 (comment)>

aviatesk mentioned this pull request Apr 26, 2022

WIP: refactor "unreached region elimination" pass #45098

Closed

aviatesk added a commit that referenced this pull request May 4, 2022

refactor unreachability analysis

938efec

Separated from #43999. xref: <#43999 (comment)>

aviatesk added a commit that referenced this pull request May 4, 2022

refactor unreachability analysis

c9daf3c

Separated from #43999. xref: <#43999 (comment)>

aviatesk added a commit that referenced this pull request May 4, 2022

refactor unreachability analysis

41ce62d

Separated from #43999. xref: <#43999 (comment)>

aviatesk added a commit that referenced this pull request May 4, 2022

refactor unreachability analysis

56ab8f0

Separated from #43999. xref: <#43999 (comment)>

aviatesk added a commit to aviatesk/julia that referenced this pull request May 10, 2022

refactor unreachability analysis

6722fdf

Separated from JuliaLang#43999. xref: <JuliaLang#43999 (comment)>

aviatesk mentioned this pull request May 11, 2022

inference: refactor the core loops to use less memory #45276

Merged

aviatesk closed this May 30, 2022

aviatesk deleted the kf/inferencerefactor branch May 30, 2022 05:05

aviatesk mentioned this pull request Aug 31, 2022

Semi-concrete IR interpreter #44803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very WIP: Refactor core inference loops to use less memory #43999

Very WIP: Refactor core inference loops to use less memory #43999

Keno commented Feb 1, 2022

aviatesk commented Apr 21, 2022

aviatesk Apr 26, 2022

aviatesk commented May 30, 2022

Very WIP: Refactor core inference loops to use less memory #43999

Very WIP: Refactor core inference loops to use less memory #43999

Conversation

Keno commented Feb 1, 2022

aviatesk commented Apr 21, 2022

aviatesk Apr 26, 2022

Choose a reason for hiding this comment

aviatesk commented May 30, 2022