[Hexagon] Enable multi input Async DMA; same queue / stage #13037

adstraw · 2022-10-11T16:34:12Z

When tir.merge_async_commit_queue_scope option is specified as False, instead of a single async_commit_queue_scope encompassing several async_scope attributes e.g. as follows:

        with T.attr(0, "async_commit_queue_scope", 0):
            with T.attr(0, "async_scope", 1):
                for ax0 in T.serial(64):
                    A_global_vtcm_1[ax0] = A[ax0]
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                B_global_vtcm_1[ax0] = B[ax0]

Force each async_scope to be in its own async_commit_queue_scope attribute e.g. as follows:

        with T.attr(0, "async_commit_queue_scope", 0):
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                A_global_vtcm_1[ax0] = A[ax0]
        with T.attr(0, "async_commit_queue_scope", 0):
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                B_global_vtcm_1[ax0] = B[ax0]

And adjust "in flight" wait counts accordingly. In the former case (single async_commit_queue_scope) the "in flight" wait count is 1 and in the latter case (multiple async_commit_queue_scope) the "in flight" wait count is n where n is the number of async_commit_queue_scope attributes --- 2 in the example.

This enables multiple input async DMA using the same queue / stage on Hexagon where before it was blocked due to the fact that "in flight" counts are managed for each individual DMA with no way to determine if a group of DMAs has completed.

adstraw · 2022-10-11T16:37:01Z

CC @masahi

masahi · 2022-10-12T11:00:24Z

Interesting, for CUDA the former behavior is more natural - treating global to shared mem async copy for A and B matrices in GEMM as one "chunk".

All CUDA async copy examples and the state of the art library follow this approach, so I want to be able to preserve this behavior for CUDA. Is it possible to make "commit_queue_scope granularity" configurable?

adstraw · 2022-10-12T22:09:33Z

@tvm-bot rerun

tvm-bot · 2022-10-13T00:11:35Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Built docs for commit 24c01e7 can be found here.

_{Generated by tvm-bot}

) * [Hexagon] Enable multi input Async DMA; same queue / stage * add option to merge (or separate) async_commit_queue_scope attrs * move merge_async_commit_queue_scope option select inside pass

[Hexagon] Enable multi input Async DMA; same queue / stage

49e3e6e

masahi self-assigned this Oct 12, 2022

adstraw added 2 commits October 12, 2022 09:55

add option to merge (or separate) async_commit_queue_scope attrs

0f07f03

move merge_async_commit_queue_scope option select inside pass

24c01e7

masahi approved these changes Oct 13, 2022

View reviewed changes

masahi merged commit 61c9742 into apache:main Oct 13, 2022

adstraw deleted the straw-hexagon-multi-input-async-dma branch October 13, 2022 15:03

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hexagon] Enable multi input Async DMA; same queue / stage #13037

[Hexagon] Enable multi input Async DMA; same queue / stage #13037

adstraw commented Oct 11, 2022 •

edited

Loading

adstraw commented Oct 11, 2022

masahi commented Oct 12, 2022 •

edited

Loading

adstraw commented Oct 12, 2022

tvm-bot commented Oct 13, 2022

[Hexagon] Enable multi input Async DMA; same queue / stage #13037

[Hexagon] Enable multi input Async DMA; same queue / stage #13037

Conversation

adstraw commented Oct 11, 2022 • edited Loading

adstraw commented Oct 11, 2022

masahi commented Oct 12, 2022 • edited Loading

adstraw commented Oct 12, 2022

tvm-bot commented Oct 13, 2022

adstraw commented Oct 11, 2022 •

edited

Loading

masahi commented Oct 12, 2022 •

edited

Loading