Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfx908 optimizations #8082

Closed
wants to merge 1 commit into from
Closed

Conversation

IMbackK
Copy link
Contributor

@IMbackK IMbackK commented Jun 23, 2024

This minor optimization work increases CDNA performance by around 10x.

Current master:

model size params backend ngl test t/s
llama 70B Q4_K - Medium 39.59 GiB 70.55 B ROCm 99 pp1024 56.73 ± 0.34
llama 7B Q4_K - Medium 4.07 GiB 7.24 B ROCm 99 pp1024 585.58 ± 0.86

This pr:

model size params backend ngl test t/s
llama 70B Q4_K - Medium 39.59 GiB 70.55 B ROCm 99 pp1024 462.30 ± 0.95
llama 7B Q4_K - Medium 4.07 GiB 7.24 B ROCm 99 pp1024 3196.68 ± 1.97

As now most of the of the remaining time is spent in attn kernels, merging #7011 further increases performance by 2x

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 23, 2024
@JohannesGaessler
Copy link
Collaborator

Which specific GPU are you using?

@IMbackK
Copy link
Contributor Author

IMbackK commented Jun 23, 2024

gfx908 aka MI100, gfx90a aka mi200 family should have completely identical performance characteristics.

@ccbadd
Copy link

ccbadd commented Jun 24, 2024

If this gets merged I'm going to have to fire up my server with the 2X MI100s and give it another try. I never understood why they were pretty much the same speed as my W6800 Pro's.

@daniandtheweb daniandtheweb mentioned this pull request Jun 24, 2024
4 tasks
@IMbackK
Copy link
Contributor Author

IMbackK commented Jun 24, 2024

If this gets merged I'm going to have to fire up my server with the 2X MI100s and give it another try. I never understood why they were pretty much the same speed as my W6800 Pro's.

Well this ofc dosent help at all with token generation as gemv is the most time consumeing kenrel thair. In generall looking at omniperf, there is quite some distance to go there for decent performance

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 24, 2024
@JohannesGaessler
Copy link
Collaborator

There is some parallel work by me on the matrix multiplication code in #8062 and #8075 that touch the same code as this PR. After these two PRs have been merged, please adapt your changes and confirm that the performance improvement is still there. Sorry for the inconvenience.

@IMbackK
Copy link
Contributor Author

IMbackK commented Jul 15, 2024

closed due to obsoletion for now, i will reopen with a rebased version at some point in the future

@IMbackK IMbackK closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants