Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communicate blocksize constraints to kernels that take blocksize as a runtime argument #1317

Open
mm04926412 opened this issue Aug 14, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request Low Risk Risk of bugs in transformers and other libraries

Comments

@mm04926412
Copy link

mm04926412 commented Aug 14, 2024

Feature request

Looking through the code base I've noticed in places like kgemm_4bit_inference_naive that there is integer division by block_size on GPU where block_size is a runtime argument, not a template argument. On the python front end there is a constraint that blocksize be a power of 2 but that isn't communicated to the kernel. integer division without a bitshift simplification has poor performance on GPU. Rewrite these kernels so that they can replace the integer divisions with bitshifts.

Motivation

Integer division is slow but not with powers of two, the kernels don't know they can just bitshift because the constraint is only enforced on the python front end.

Your contribution

I'd be happy to submit a PR to resolve this if there isn't some deeper reason why things are written this way.

@matthewdouglas matthewdouglas added waiting for info contributions-welcome We welcome contributions to fix this issue! labels Aug 14, 2024
@matthewdouglas
Copy link
Member

Hi @mm04926412, we discussed briefly on the CUDA MODE Discord. Happy to look at any PRs for this, and ideally some data to show the impact would be appreciated! Feel free to reach out with any questions.

@matthewdouglas matthewdouglas self-assigned this Aug 14, 2024
@matthewdouglas matthewdouglas added the enhancement New feature or request label Aug 14, 2024
@Titus-von-Koeller
Copy link
Collaborator

@mm04926412 thanks for raising this. Honestly, I'm not sure if there's a deeper reason for this. I'll raise it with Tim this Friday and let you know what he says.

Cool that you're looking into our kernels! Feel free to ping us about stuff, we'd be happy to draw in more enthusiasts and continue making BNB even more of a community project..

@TimDettmers
Copy link
Collaborator

Very good catch. I think this would be a good contribution. In general, the latency overhead over dequantization operation is currently the biggest slowdown for these kernels.

I think if you replace the division by a bitshift and add software constraints as an assert/error message for the block size to be powers of 2 that might be the best solution.

I think I wrote these kernels in this way to allow for faster prototyping and debugging performance issues with different GPUs with different powers. As such, this is an artifact of prototyping, and replacing the division with a bitshift should be a strict improvement.

@mm04926412
Copy link
Author

Great. I don't have a testing environment for the library or anything to test with, so can do a PR once I get that setup. For this change would it be best for me to make a PR or would you rather I leave it with you? What tests / benchmarks would you like for a PR?

@mm04926412
Copy link
Author

Unfortunately it looks like I need to reinstall linux (I'm on an old linux mint workstation) in order to properly compile the repo. So unfortuantely I'm not in a position to do the PR until I can put aside time to do that and get a proper Cmake and CUDA environment. I've provided a suggested form of the solution but sadly can't do much more.

a1259ef

@matthewdouglas
Copy link
Member

@mm04926412 No problem! Your suggested form is more or less what I was expecting to see (except in my head I was using __clz instead of __ffs). I could take it from here and start a PR. Thank you for looking into this and reporting!

@matthewdouglas matthewdouglas added Low Risk Risk of bugs in transformers and other libraries and removed waiting for info contributions-welcome We welcome contributions to fix this issue! labels Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Low Risk Risk of bugs in transformers and other libraries
Projects
None yet
Development

No branches or pull requests

4 participants