-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Ops] Add paddlenlp_kernel for CUDA and Triton ops(#9471)
* add paddlenlp_gpu_ops * add inf_cl loss function * rename ppnlp_kernel * rename paddlenlp_kernel
- Loading branch information
Showing
90 changed files
with
24,106 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
.PHONY: pre-release bdist-wheel | ||
|
||
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!) | ||
export PYTHONPATH = src | ||
|
||
bdist-wheel: | ||
cd csrc && python setup.py build && cd .. && python setup.py bdist_wheel | ||
|
||
pre-release: | ||
python utils/release.py | ||
|
||
pre-patch: | ||
python utils/release.py --patch | ||
|
||
post-release: | ||
python utils/release.py --post_release | ||
|
||
post-patch: | ||
python utils/release.py --post_release --patch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# PaddleNLP Kernel 库 | ||
> paddlenlp-kernel 是一个专为 PaddleNLP 量身打造的 GPU 算子库,它集成了一系列常用的自然语言处理(NLP)算子,并提供了 CUDA 和 Triton 两种高效的实现方式,旨在充分利用 GPU 的卓越计算能力,为 NLP 任务加速。 | ||
当前支持的算子包括: | ||
- mamba1 和 mamba2 算子 | ||
- fast_ln 和 fused_ln 算子 | ||
- ml-cross-entropy 算子 | ||
- inf_cl 算子 | ||
|
||
# 安装指南 | ||
|
||
## 编译 cuda 算子 | ||
要编译 `CUDA` 算子,请执行以下命令: | ||
```bash | ||
cd csrc | ||
rm -rf build dist *.egg-info # 清除旧的构建文件和目录 | ||
python setup.py build # 开始新的编译过程 | ||
``` | ||
|
||
## 打包 wheel | ||
编译完成后,您可以将 `CUDA` 算子打包成 `Wheel` 包以便安装: | ||
```bash | ||
python setup.py bdist_wheel | ||
``` | ||
|
||
## 安装 wheel | ||
使用 pip 命令安装刚刚生成的 Wheel 包: | ||
```bash | ||
pip install dist/*.whl | ||
``` | ||
|
||
## 使用 paddlenlp_kernel 库 | ||
以下是如何在代码中使用 `CUDA` 和 `Triton` 算子的示例: | ||
```python | ||
# 导入并使用 CUDA 算子 | ||
from paddlenlp_kernel.cuda.selective_scan import selective_scan_fn | ||
xxx = selective_scan_fn(xxx) | ||
|
||
# 导入并使用 Triton 算子 | ||
from paddlenlp_kernel.triton.inf_cl import cal_flash_loss | ||
xxx = cal_flash_loss(xxx) | ||
``` | ||
|
||
# 测试 | ||
|
||
要测试 `CUDA` 和 `Triton` 算子,请分别运行以下命令: | ||
```bash | ||
pytest -v tests/cuda # 测试 CUDA 算子 | ||
pytest -v tests/triton # 测试 Triton 算子 | ||
``` | ||
|
||
通过上述步骤,您将能够顺利安装并测试 `paddlenlp_kernel` 库,享受 GPU 加速带来的高效 NLP 算子体验。 | ||
|
||
# 注意 | ||
|
||
推荐用户使用以下版本的库: | ||
- paddlepaddle-gpu >= 3.0.0b2 | ||
- triton >= 3.0.0 | ||
|
||
由于 `Triton` 库原本依赖于 `PyTorch`,为了方便 `Paddle` 用户使用 `Triton`,您可以按照以下步骤替换 `Triton` 库的部分源码,使其与 `Paddle` 兼容: | ||
|
||
```bash | ||
python -m pip install git+https://github.com/zhoutianzi666/UseTritonInPaddle.git | ||
# 只需执行一次以下命令,之后即可在任意终端中使用 Triton,无需重复执行 | ||
python -c "import use_triton_in_paddle; use_triton_in_paddle.make_triton_compatible_with_paddle()" | ||
``` | ||
|
||
# 参考资料 | ||
- https://github.com/state-spaces/mamba | ||
- https://github.com/Dao-AILab/causal-conv1d | ||
- https://github.com/apple/ml-cross-entropy | ||
- https://github.com/DAMO-NLP-SG/Inf-CLIP |
Oops, something went wrong.