-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flash Decode GQA (and MQA) Improvements (Round 1) #12739
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
caixunshiren
added
kernels
kernels, such as hlks or llks or below
P1
LLM_feature
llama3
labels
Sep 16, 2024
caixunshiren
requested review from
eyonland,
patrickroberts,
yan-zaretskiy,
cfjchu,
xanderchin,
TT-BrianLiu,
ayerofieiev-tt and
dmakoviichuk-tt
as code owners
September 17, 2024 18:23
caixunshiren
force-pushed
the
page-attention-gqa
branch
from
September 17, 2024 22:35
f7ae607
to
a650ec3
Compare
cfjchu
approved these changes
Sep 18, 2024
sraizada-tt
approved these changes
Sep 18, 2024
caixunshiren
force-pushed
the
page-attention-gqa
branch
from
September 18, 2024 14:34
a650ec3
to
bbcb8f7
Compare
all post commit: https://github.com/tenstorrent/tt-metal/actions/runs/10951786445 |
caixunshiren
changed the title
Flash Decode GQA (and MQA) Improvements
Flash Decode GQA (and MQA) Improvements (Round 1)
Sep 18, 2024
caixunshiren
force-pushed
the
page-attention-gqa
branch
from
September 20, 2024 01:51
bbcb8f7
to
c11caf8
Compare
caixunshiren
requested review from
yieldthought,
mtairum and
uaydonat
as code owners
September 20, 2024 01:51
uaydonat
approved these changes
Sep 20, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ticket
This PR contains the round 1 improvements outlined in #12330 :
FYI @sraizada-tt @cglagovichTT
Post commit pipeline: https://github.com/tenstorrent/tt-metal/actions/runs/10956618476/job/30423043917