-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gfx1010 optimizations #8085
gfx1010 optimizations #8085
Conversation
1d1754f
to
46923c6
Compare
|
Thanks for the information. I'll test it better in a couple of hours but for now using a value of 64 instead of the default 128 of the master branch I manage 275 t/s for the prompt processing. |
The performance boost is consistent to 275 t/s and the output works fine, however I'm having some trouble to add a check in
@JohannesGaessler any idea about how to solve this? |
Does this pr affect RDNA3? I really can use some optimizations. |
Not at all, this PR just tunes some parameters on Navi 10 that are already tuned in the 7000 series. |
I would patch it like this:
|
@JohannesGaessler I've applied the change. I still think this isn't the best way to do it because if different values are needed for different cards this can result in something messy, maybe using a normal if statement so that the row isn't too long, however I haven't been able to make it work that way.
EDIT: Apparently all the issues I've been having are caused by the check on int8_mma_available not working as intended. Just removing it in the if check makes everything work again. |
Yes,
It is working as intended, you are just not using it as intended. The lowercase |
I see. In this case would it be better to keep it like this:
Or go like this:
? |
You cannot do the second one. |
Okay, I actually wanted to specify for RDNA1 because I wasn't sure of the effects it could have on the pr #8082. |
c4005a9
to
e4accb8
Compare
Sorry, actually it has to be done the way you had it with an RDNA1 check. On AMD you cannot do a simple check against a number because there is no sensible value for |
e4accb8
to
68b57ed
Compare
I think this may be ready to merge, once all the checks are completed. Thanks for the tips on how to improve it. |
Reading @IMbackK 's PR #8082 I've noticed that RDNA1 cards can also benefit from a small performance gain just by adjusting the same values as that PR.
This is still far from the performance pre #7716 (RDNA1 cards suffered a 50% performance drop with that) but it's still a good performance improvement.
Thanks again to @IMbackK for his PR as I wouldn't have noticed this without it.