-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMS Norm achieving poor memory bandwidth on MI300 #422
Comments
Definitely curious what’s going on here. This is less than half of peak bandwidth; by contrast on A100 we get essentially 100% of peak bw |
@bertmaher Not sure whether you already figured this out. If you change the kernel signature to:
i.e., changing
though we still have room to improve. |
Closing due to inactivity |
@anupambhatnagar I know this is a closed issue, the Babelstream results that you pasted above, is it from this repo? |
The RMS norm implementation below achieves well below the peak possible memory bandwidth on MI300. The results can be reproduced using the code below.
When benchmarking BabelStream on the same device I achieved much better performance. Any insights on how to achieve better perf on RMS norm will be appreciated. Thank you!
RMS norm implementation
RMS Norm memory bandwidth achieved in GB/sec.
BabelStream Performance
The text was updated successfully, but these errors were encountered: