Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference: refactor the core loops to use less memory #45276

Merged
merged 16 commits into from
May 30, 2022

Conversation

aviatesk
Copy link
Member

A minimum version of #43999, without any apparent regressions.

The original PR description by @Keno will follow:


Currently inference uses O(<number of statements>*<number of slots>) state
in the core inference loop. This is usually fine, because users don't tend
to write functions that are particularly long. However, MTK does generate
functions that are excessively long and we've observed MTK models that spend
99% of their inference time just allocating and copying this state.
It is possible to get away with significantly smaller state, and this PR is
a first step in that direction, reducing the state to O(<number of basic blocks>*<number of slots>).
Further improvements are possible by making use of slot liveness information
and only storing those slots that are live across a particular basic block.

The core change here is to keep a full set of slottypes only at
basic block boundaries rather than at each statement. For statements
in between, the full variable state can be fully recovered by
linearly scanning throughout the basic block, taking note of
slot assignments (together with the SSA type) and NewVarNodes.

The current status of this branch is that the changes appear correct
(no known functional regressions) and significantly improve the MTK
test cases in question (no exact benchmarks here for now, since
the branch still needs a number of fixes before final numbers make
sense), but somewhat regress optimizer quality (which is expected
and just a missing TODO) and bootstrap time (which is not expected
and something I need to dig into).


@nanosoldier runbenchmarks(!"scalar", vs=":master")

@oscardssmith oscardssmith added the compiler:inference Type inference label May 11, 2022
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("broadcast" || "collection", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from e0074a2 to 9c11c71 Compare May 12, 2022 16:14
@aviatesk
Copy link
Member Author

@nanosoldier runtests(["LoopVectorization", "RecursiveFactorization"], vs = ":master")
@nanosoldier runbenchmarks(!"scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - no new issues were detected. A full report can be found here.

Copy link
Member

@ianatol ianatol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good work Shuhei!

Edit: Just saw PkgEval results, I guess there is more work to be done. Let me know if you want help, as I read it somewhat thoroughly to do this review (though obviously not thoroughly enough to catch any bugs 😂 )

base/compiler/abstractinterpretation.jl Outdated Show resolved Hide resolved
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from 406cb61 to 2336683 Compare May 13, 2022 00:31
@aviatesk
Copy link
Member Author

@nanosoldier runtests(["LoopVectorization", "RecursiveFactorization"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - no new issues were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch from 2336683 to c642a71 Compare May 13, 2022 01:39
@aviatesk
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")
@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch 5 times, most recently from d6a4417 to 87d863c Compare May 17, 2022 09:51
@aviatesk

This comment was marked as resolved.

@aviatesk aviatesk changed the title inference: refactor the core loops to use less memory wip: inference: refactor the core loops to use less memory May 17, 2022
@aviatesk aviatesk force-pushed the avi/inferencerefactor branch 2 times, most recently from 72c03d4 to c245fb8 Compare May 18, 2022 08:30
@aviatesk
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")
@nanosoldier runbenchmarks(!"scalar", vs=":master")

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch from c67e37d to 283aee5 Compare May 24, 2022 09:09
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo -n /nanosoldier/cset/bin/cset shield -e -- sudo -n -u nanosoldier-worker -- /nanosoldier/workdir/jl_P8M1Ku/benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/inferencerefactor branch from cde6129 to eb6d7ca Compare May 27, 2022 07:33
aviatesk and others added 16 commits May 30, 2022 11:18
Currently inference uses `O(<number of statements>*<number of slots>)` state
in the core inference loop. This is usually fine, because users don't tend
to write functions that are particularly long. However, MTK does generate
functions that are excessively long and we've observed MTK models that spend
99% of their inference time just allocating and copying this state.
It is possible to get away with significantly smaller state, and this PR is
a first step in that direction, reducing the state to `O(<number of basic blocks>*<number of slots>)`.
Further improvements are possible by making use of slot liveness information
and only storing those slots that are live across a particular basic block.

The core change here is to keep a full set of `slottypes` only at
basic block boundaries rather than at each statement. For statements
in between, the full variable state can be fully recovered by
linearly scanning throughout the basic block, taking note of
slot assignments (together with the SSA type) and NewVarNodes.

The current status of this branch is that the changes appear correct
(no known functional regressions) and significantly improve the MTK
test cases in question (no exact benchmarks here for now, since
the branch still needs a number of fixes before final numbers make
sense), but somewhat regress optimizer quality (which is expected
and just a missing TODO) and bootstrap time (which is not expected
and something I need to dig into).

Co-Authored-By: Keno Fisher <keno@juliacomputing.com>
@aviatesk aviatesk force-pushed the avi/inferencerefactor branch from eb6d7ca to 973ff33 Compare May 30, 2022 02:18
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

The nanosolidier results look nice. Going to merge.

@aviatesk aviatesk merged commit 5a32626 into master May 30, 2022
@aviatesk aviatesk deleted the avi/inferencerefactor branch May 30, 2022 04:40
aviatesk added a commit that referenced this pull request May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants