-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] XPU accelerator support for Intel GPU device #4547
Conversation
* add aio in xpu_upstream * Update async_io.py deleting private path
* add aio in xpu_upstream * Update async_io.py deleting private path * Update async_io.py
* add sylomatic code into upstream enable jit_load for sycl kernels * find Python.h using general code * * add SYCLAutoOpBuilder to support InferenceOpBuilder * move scripts path to op_builder/xpu * only change cuda files extension * delete unused code in inferenceBuilder * change third-party relative path to enabel python install * extracty smaller functions from sycl_extension * change from_blob in source code to avoid big part post processing * run pre-commit * add BF16 support * add license to csrc/xpu code
SYCLAutoOPBuilder is integrated to convert CUDA kernels into SYCL kernels. Currently transformer inference kernels will be converted automatically during installation time. We are investigating whether we can expand this builder to other kernels so we can reduce SYCL kernel files. @baodii who is working on SYCLAutoOPBuilder. |
* add sylomatic code into upstream enable jit_load for sycl kernels * find Python.h using general code * * add SYCLAutoOpBuilder to support InferenceOpBuilder * move scripts path to op_builder/xpu * only change cuda files extension * change third-party relative path to enabel python install * extracty smaller functions from sycl_extension * change from_blob in source code to avoid big part post processing * run pre-commit * add BF16 support * add other OPBuilder. fused_adam done * cpu_adam done * all xpu OpBuilder done, need more test * delete csrc/xpu * delete useless files
oneapi-src/SYCLomatic#1398 Issue for last residual SYCL porting, when it's done we'll fully migrate all kernels. @delock @baodii |
* fix xpu builder to make install sucessfully * fix AT_CUDA_CHECK error
* * delete SYCLAutoOpBuilder * add optimizer SYCLOpBuilder * delete transformer_inference op * fix format error
Hi @tjruwase @jeffra, after internal discussion, we removed SyclAutoOpBuilder due to consideration that altough we have validated automatically converted SYCL kernel at this time, if there is new CUDA code change in existing CUDA kernel and these new changes had been automatically converted with SyclAutoOPBuilder, we might have broken functionality and performance in converted SYCL code, if this process is fully automated. Instead, we prefer to 1) convert CUDA kernels with SyclAutoOpBuilder offline, 2) validate these converted SYCL code 3) upstream the valdiated SYCL kernel. Current SYCL kernels in this PR is the result of this process. Let us know your thoughts and comments and we can discussion how to go forward. Thanks! |
@delock, thanks for the update. If I understand correctly, you plan to upstream new SYCL kernels via PRs? |
Hi @tjruwase , it depends. For supporting DeepSpeed OpBuilder we have two methods:
On a big picture, we expect most DeepSpeed feature needs OpBuilder supported through method 2 for XPU, with the intention that the functionality could also be reused elsewhere. We may see method 1 be used in two situations:
|
@delock, apologies for the delay in reviewing PR. We will prioritize this now. |
Thanks @delock this all looks great! Do you have tests that you are running internally to verify this code? |
Hi @mrwyattii yes we validate this code on XPU device regulary for inference and training workloads. |
This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can support XPU devices without install Intel Extension for DeepSpeed. --------- Co-authored-by: Liangliang-Ma <1906710196@qq.com> Co-authored-by: baodi <di.bao@intel.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Yizhou Wang <yizhou.wang@intel.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can support XPU devices without install Intel Extension for DeepSpeed.