Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Enable multi input Async DMA; same queue / stage #13037

Merged
merged 3 commits into from
Oct 13, 2022

Conversation

adstraw
Copy link
Contributor

@adstraw adstraw commented Oct 11, 2022

When tir.merge_async_commit_queue_scope option is specified as False, instead of a single async_commit_queue_scope encompassing several async_scope attributes e.g. as follows:

        with T.attr(0, "async_commit_queue_scope", 0):
            with T.attr(0, "async_scope", 1):
                for ax0 in T.serial(64):
                    A_global_vtcm_1[ax0] = A[ax0]
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                B_global_vtcm_1[ax0] = B[ax0]

Force each async_scope to be in its own async_commit_queue_scope attribute e.g. as follows:

        with T.attr(0, "async_commit_queue_scope", 0):
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                A_global_vtcm_1[ax0] = A[ax0]
        with T.attr(0, "async_commit_queue_scope", 0):
            T.attr(0, "async_scope", 1)
            for ax0 in T.serial(64):
                B_global_vtcm_1[ax0] = B[ax0]

And adjust "in flight" wait counts accordingly. In the former case (single async_commit_queue_scope) the "in flight" wait count is 1 and in the latter case (multiple async_commit_queue_scope) the "in flight" wait count is n where n is the number of async_commit_queue_scope attributes --- 2 in the example.

This enables multiple input async DMA using the same queue / stage on Hexagon where before it was blocked due to the fact that "in flight" counts are managed for each individual DMA with no way to determine if a group of DMAs has completed.

@adstraw
Copy link
Contributor Author

adstraw commented Oct 11, 2022

CC @masahi

@masahi masahi self-assigned this Oct 12, 2022
@masahi
Copy link
Member

masahi commented Oct 12, 2022

Interesting, for CUDA the former behavior is more natural - treating global to shared mem async copy for A and B matrices in GEMM as one "chunk".

All CUDA async copy examples and the state of the art library follow this approach, so I want to be able to preserve this behavior for CUDA. Is it possible to make "commit_queue_scope granularity" configurable?

@adstraw
Copy link
Contributor Author

adstraw commented Oct 12, 2022

@tvm-bot rerun

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 13, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@masahi masahi merged commit 61c9742 into apache:main Oct 13, 2022
@adstraw adstraw deleted the straw-hexagon-multi-input-async-dma branch October 13, 2022 15:03
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
)

* [Hexagon] Enable multi input Async DMA; same queue / stage

* add option to merge (or separate) async_commit_queue_scope attrs

* move merge_async_commit_queue_scope option select inside pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants