Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FRONTEND][BACKEND] Cleanup and re-enable optimization with fp8e4b15 #3521

Merged
merged 2 commits into from
Apr 1, 2024

Conversation

ThomasRaoux
Copy link
Collaborator

Multiple fixes to allow pipelining to happen when generating a matmul with fp8e4b15 inputs. Also clean useless code in the frontend.

@ThomasRaoux ThomasRaoux marked this pull request as ready for review April 1, 2024 08:33
@ThomasRaoux ThomasRaoux requested a review from ptillet as a code owner April 1, 2024 08:33
…e4b15

Multiple fixes to allow pipelining to happen when generating a matmul
with fp8e4b15 inputs. Also clean useless code in the frontend.
Copy link
Contributor

@pawelszczerbuk pawelszczerbuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just minor nits from my side!

@@ -149,6 +149,24 @@ SmallVector<Value> packI32(const SmallVector<Value> &inValues, Type srcTy,
}
return outValues;
}

int getNumElmenetPerThreads(Type type, const LLVMTypeConverter *typeConverter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Elmenet -> Element

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops.. fixed

Copy link
Contributor

@pawelszczerbuk pawelszczerbuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just minor nits from my side!

@jlebar
Copy link
Collaborator

jlebar commented Apr 1, 2024

I have a few review comments, please hold on for a sec till I'm out of the interview...

@ThomasRaoux
Copy link
Collaborator Author

I have a few review comments, please hold on for a sec till I'm out of the interview...

Thanks Justin, I'll push that for now to unblock my wheel update but please add your comments and I'll send a follow up PR

@ThomasRaoux ThomasRaoux merged commit ea40df4 into triton-lang:main Apr 1, 2024
5 checks passed
@@ -149,6 +149,24 @@ SmallVector<Value> packI32(const SmallVector<Value> &inValues, Type srcTy,
}
return outValues;
}

int getNumElmenetPerThreads(Type type, const LLVMTypeConverter *typeConverter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNumElementsPerThread?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already addressed

@@ -149,6 +149,24 @@ SmallVector<Value> packI32(const SmallVector<Value> &inValues, Type srcTy,
}
return outValues;
}

int getNumElmenetPerThreads(Type type, const LLVMTypeConverter *typeConverter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only correct for inline asm, right? If so, can we change the name to indicate that? Also can we add a comment indicating why we do 32/size? (It's because inline asm implicitly packs elements in this way, I believe.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I think the function should work for any op.

Also can we add a comment indicating why we do 32/size?

sure

lib/Dialect/TritonGPU/Transforms/Utility.cpp Show resolved Hide resolved
ThomasRaoux added a commit to ThomasRaoux/triton that referenced this pull request Apr 1, 2024
// need to reorder them so we iterate over the operands' elements in the
// same logical order.
for (unsigned i = 0; i < unpackedOperands.size(); ++i) {
unpackedOperands[i] = reorderValues(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, where did this call to reorderValues go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was not correct, removed it

ThomasRaoux added a commit that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants