[Unity][Contrib] Add vLLM paged attention kernel #15995

masahi · 2023-10-26T20:22:18Z

This PR adds vLLM paged attention and update kernels as contrib. This can be used by MLC to enable batched inference, see mlc-ai/mlc-llm#1134

The vLLM community has recently added the V2 kernel in vllm-project/vllm#1348. This PR doesn't include the V2 change. Once the V2 kernel becomes mature, we can integrate it.

@tqchen @MasterJH5574 @yzh119 @yelite @sunggg

yelite · 2023-10-27T13:07:35Z

cmake/modules/CUDA.cmake

@@ -64,6 +64,7 @@ if(USE_CUDA)
    message(STATUS "Build with Thrust support")
    cmake_minimum_required(VERSION 3.13) # to compile CUDA code
    enable_language(CUDA)
+    set(CMAKE_CUDA_ARCHITECTURES "80;75")


Why do we want to explicitly set the cuda arch here?

The is due to an odd behavior of cmake that I don't understand. If I remove this and set USE_THRUST, I get

CMake Error in CMakeLists.txt: CUDA_ARCHITECTURES is empty for target "tvm_runtime_objs".

. Previously I didn't encounter this problem when I was using USE_THRUST (a couple of years ago).

This doesn't apply to vllm.cmake for some reason. But there, if I don't have set(CMAKE_CUDA_ARCHITECTURES "80;75"), I get

error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

during build.

yelite · 2023-10-27T13:09:48Z

cmake/modules/contrib/vllm.cmake

+  message(STATUS "Build with vllm paged attention kernel.")
+  include_directories(src/runtime/contrib/vllm)
+  enable_language(CUDA)
+  set(CMAKE_CUDA_ARCHITECTURES "80;75")


Should it let users set the cuda arch they want in config.cmake instead?

Not all users (including me) use config.cmake. I think building for sm 80 and 75 should be enough in practice (newer arch can use sm80 code). We can add 70 if we still want to support v100.

@masahi

The CMAKE_CUDA_ARCHITECTURES is designed to be provided by user at cmake config time.

There are also values like 'auto', 'all' see CUDA_ARCHITECTURES

Better throw a warning like: 'Using 'all' for CUDAARCHS, use CMAKE_CUDA_ARCHITECTURES to override'.

Also it is initialized by env var CUDAARCHS and picked up from there.

Ok made it configurable, and use 80;75 by default. auto will make the compilation time for these kernels (especially thrust) extremely slow.

starrkk · 2023-11-29T01:47:11Z

@masahi Thank you very much for sharing the open source cause, your pr is very meaningful to me, can you continue to integrate your code

masahi added 5 commits October 26, 2023 20:22

Add vllm kernels

a4c7804

add license

c301728

clean

0e54b10

fix cmake

4817f3e

update test

99f14e5

masahi force-pushed the contrib-vllm-kernels branch from 4ca0aff to 99f14e5 Compare October 26, 2023 20:22

masahi mentioned this pull request Oct 26, 2023

Add batched Llama model definition using vLLM paged attention mlc-ai/mlc-llm#1134

Merged

masahi added 2 commits October 27, 2023 01:53

cuh -> h

c3c6bdf

reduce lint errors

851c156

yelite reviewed Oct 27, 2023

View reviewed changes

masahi added 3 commits October 31, 2023 14:55

more lint fix

b6546f1

Avoid hard-coded CMAKE_CUDA_ARCHITECTURES

724d5e9

more lint fix

971483d

junrushao force-pushed the unity branch 2 times, most recently from c95d45f to 45eeb8c Compare December 18, 2023 21:00

vinx13 mentioned this pull request Jan 5, 2024

[Unity][Contrib] Add vLLM paged attention kernel #16350

Merged

masahi closed this Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][Contrib] Add vLLM paged attention kernel #15995

[Unity][Contrib] Add vLLM paged attention kernel #15995

masahi commented Oct 26, 2023 •

edited

Loading

yelite Oct 27, 2023

masahi Oct 27, 2023

yelite Oct 27, 2023

masahi Oct 27, 2023 •

edited

Loading

cbalint13 Oct 27, 2023 •

edited

Loading

masahi Oct 31, 2023

starrkk commented Nov 29, 2023

[Unity][Contrib] Add vLLM paged attention kernel #15995

[Unity][Contrib] Add vLLM paged attention kernel #15995

Conversation

masahi commented Oct 26, 2023 • edited Loading

yelite Oct 27, 2023

Choose a reason for hiding this comment

masahi Oct 27, 2023

Choose a reason for hiding this comment

yelite Oct 27, 2023

Choose a reason for hiding this comment

masahi Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

cbalint13 Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

masahi Oct 31, 2023

Choose a reason for hiding this comment

starrkk commented Nov 29, 2023

masahi commented Oct 26, 2023 •

edited

Loading

masahi Oct 27, 2023 •

edited

Loading

cbalint13 Oct 27, 2023 •

edited

Loading