【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API #59890

MayYouBeProsperous · 2023-12-11T07:19:45Z

PR types

Others

PR changes

APIs

Description

【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API community#721

paddle-bot · 2023-12-11T07:19:50Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

MayYouBeProsperous · 2023-12-18T09:12:01Z

@zhwesky2010 麻烦review

zhwesky2010 · 2023-12-19T04:17:48Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

+      phi::Stream(reinterpret_cast<phi::StreamId>(dev_ctx_.stream())));
+  void* tmp_buffer_b_ptr = tmp_buffer_b->ptr();
+
+  dev_ctx_.CusparseCall([&](cusparseHandle_t handle) {


这里为啥调了这么多次cusparseSpGEMM_compute

CUSPARSE_SPGEMM_ALG1 算法需要调用两次 cusparseSpGEMM_compute，另外两个算法需要 CUDA 12 才能支持。

zhwesky2010 · 2023-12-19T04:18:43Z

paddle/phi/kernels/sparse/gpu/matmul_grad_kernel.cu

+                            const SparseCooTensor& dout,
+                            SparseCooTensor* dx,
+                            SparseCooTensor* dy) {
+  // 'cusparseSPGEMM' only support CSR now, so use COO->CSR->COO,


cusparse以后有可能支持coo吗

最新的CUDA 12.3 还未支持。

zhwesky2010 · 2023-12-19T04:19:50Z

paddle/phi/kernels/sparse/gpu/matmul_kernel.cu

+    batch_size *= out_dim_vec[i];
+  }
+
+  PADDLE_ENFORCE_EQ(


不支持3D的COO和CSR吗

增加了3d coo*coo, csr*scr 的单侧

zhwesky2010 · 2023-12-19T04:20:17Z

paddle/phi/kernels/sparse/gpu/matmul_kernel.cu

+                        const SparseCsrTensor& x,
+                        const SparseCsrTensor& y,
+                        SparseCsrTensor* out) {
+  MatmulKernelImpl<T>(dev_ctx, x, y, out);


为啥不直接把实现放这里面，然后COO调CSR的kernel

好的已修改

zhwesky2010 · 2023-12-19T04:21:21Z

test/legacy_test/test_sparse_matmul_op.py

+        not paddle.is_compiled_with_cuda() or get_cuda_version() < 11000,
+        "only support cuda>=11.0",
+    )
+    def test_matmul_2d(self):


3d的coo * dense目前支持吗

coo*dense 支持 3d，相关单测原来已经有了

Paddle/test/legacy_test/test_sparse_matmul_op.py

Line 39 in 5d32c61

class TestMatmul(unittest.TestCase):

zhwesky2010 · 2023-12-19T04:21:35Z

test/legacy_test/test_sparse_matmul_op.py

+        not paddle.is_compiled_with_cuda() or get_cuda_version() < 11000,
+        "only support cuda>=11.0",
+    )
+    def test_matmul_2d(self):


3d的csr * dense目前支持吗

csr*dense 支持 3d，相关单测原来已经有了

Paddle/test/legacy_test/test_sparse_matmul_op.py

Line 39 in 5d32c61

class TestMatmul(unittest.TestCase):

3d的coo*coo目前支持吗

支持的，单测在下方。

zhwesky2010 · 2023-12-19T04:22:23Z

test/legacy_test/test_sparse_matmul_op.py

+
+        sp_y = origin_y.detach().to_sparse_csr()
+        # only support 32-bit index.
+        sp_y_crows = paddle.cast(sp_y.crows(), "int32")


coo*dense也只能int32吗

coo*dense 支持 int64，相关单测原来已经有了

对于coocoo，csrcsr，如果是int64，插入一个稀疏的CastKernel是不是就可以了

确实可以，修改使用了 CastCsrKernel

zhwesky2010 · 2023-12-19T04:22:34Z

test/legacy_test/test_sparse_matmul_op.py

+
+        sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
+
+        # only support 32-bit index.


csr*dense也只能int32吗

csr*dense 也支持 int64，相关单测原来已经有了

zhwesky2010

有个问题：
1.对于int64，插入稀疏CastKernel可以支持吗
2.对于3D是11.8以上会支持吗

zhwesky2010 · 2023-12-21T11:13:41Z

test/legacy_test/test_sparse_matmul_op.py

+
+        sp_y = origin_y.detach().to_sparse_csr()
+        # only support 32-bit index.
+        sp_y_crows = paddle.cast(sp_y.crows(), "int32")


对于coocoo，csrcsr，如果是int64，插入一个稀疏的CastKernel是不是就可以了

zhwesky2010 · 2023-12-21T11:14:01Z

test/legacy_test/test_sparse_matmul_op.py

+        not paddle.is_compiled_with_cuda() or get_cuda_version() < 11000,
+        "only support cuda>=11.0",
+    )
+    def test_matmul_2d(self):


3d的coo*coo目前支持吗

MayYouBeProsperous · 2023-12-21T15:08:52Z

有个问题： 1.对于int64，插入稀疏CastKernel可以支持吗 2.对于3D是11.8以上会支持吗

确实可以，修改支持了 int64 index。

3D 依赖 cusparseCsrSetStridedBatch、cusparseCooSetStridedBatch，我看到 11.0 就已经支持了，3D 单测限制 11.8 是因为框架中相关代码会抛出error：

Paddle/paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

Lines 105 to 116 in 1383a2f

    
             if (batch_size > 1) { 
        
           #if CUDA_VERSION >= 11080 
        
               dev_ctx.CusparseCall([&](cusparseHandle_t handle) { 
        
                 phi::dynload::cusparseCsrSetStridedBatch( 
        
                     *descriptor, batch_size, M + 1, batch_nnz); 
        
               }); 
        
           #else 
        
               PADDLE_THROW(phi::errors::Unimplemented( 
        
                   "Batch Sparse matmul use 'cusparseCsrSetStridedBatch', which is " 
        
                   "supported from CUDA 11.8")); 
        
           #endif 
        
             }

不同的显卡感觉结果不太一样，应该用什么卡测试呢？

zhwesky2010 · 2023-12-22T03:18:26Z

@MayYouBeProsperous cusparseCsrSetStridedBatch 这个是因为之前测试发现CUDA 11.8才能跑对，就按CUDA11.8来吧，目前是11.8上支持了3D对吧

zhwesky2010 · 2023-12-22T03:26:48Z

test/legacy_test/test_sparse_matmul_op.py

+        not paddle.is_compiled_with_cuda() or get_cuda_version() < 11080,
+        "only support cuda>=11.8",
+    )
+    def test_matmul_3d(self):


3d就按目前11.8来吧，不用改就行

zhwesky2010 · 2023-12-22T03:42:16Z

paddle/phi/kernels/sparse/gpu/matmul_grad_kernel.cu

+                       static_cast<T>(0),
+                       &dx_tmp);
+
+    CastCsrKernel<T, Context>(


有个小问题在于：如果原本是int32，就会浪费4次cast的性能成本。如果原本是int64，目前的实现就没问题。

这里建议特化一下，加入类型判断，如果int32，无需任何操作，如果是非int32，再插入4次cast，并且最后一次cast是cast成index原本的类型而不一定指定int64。

请下个PR提交一下

好的，我提交PR修改一下。

paddle-ci-bot · 2023-12-29T03:11:24Z

Sorry to inform you that c8ff3d5's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

MayYouBeProsperous · 2024-01-04T09:49:35Z

PR-CI-Coverage 中的 3D COO*COO CSR*CSR 单测现在有报错

3D 应该需要对每个 batch 进行计算，还在开发中。

zhwesky2010 · 2024-01-08T10:45:37Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

+    if (batch_size > 1) {
+#if CUDA_VERSION >= 11080
+      dev_ctx.CusparseCall([&](cusparseHandle_t handle) {
+        phi::dynload::cusparseCsrSetStridedBatch(


你这里不是已经设置了3D的stride吗，为什么Batch还会计算不成功呢

cusparseCsrSetStridedBatch 现在应该只能用于 spmm 和 sddmm，还不支持 spgemm。

zhwesky2010 · 2024-01-08T10:53:24Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

+  cusparseSpMatDescr_t descriptor_;
+
+  // temporarily save crows and cols for int64_t index csr
+  DenseTensor crows_int;


内部变量可以加两个下划线：crows_int_、cols_int_

这个类删除了

zhwesky2010 · 2024-01-08T11:06:39Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

+  const phi::GPUContext& dev_ctx_;
+  cusparseSpMatDescr_t descriptor_;
+
+  // temporarily save crows and cols for int64_t index csr


这个注释是不是错了，不是存储int32的index的吗

这个类删除了

zhwesky2010 · 2024-01-08T11:09:35Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

@@ -481,6 +493,248 @@ void SparseBlas<phi::GPUContext>::SDDMM(bool transa,
 }
 #endif

+/************* SPARSE*SPARSE->SPARSE MATMUL ************/
+template <typename T>
+class CuSparseSpGEMMCsrDescriptor {


这个名字建议可以优化下：CreateInt32IndexCsrDescriptor、CreateInt32IndexCooDescriptor

为了支持3D计算，需要在每个batch上 cusparseCreateCsr，这个类删除了。

zhwesky2010 · 2024-01-08T11:25:12Z

@MayYouBeProsperous 看起来没有什么计算的大问题，为什么还会计算不成功呢？

MayYouBeProsperous · 2024-01-08T14:47:00Z

@zhwesky2010 之前 approve 的代码不知道为啥能通过单测，同样的代码现在也会报错，可能 PR-CI-Coverage 环境有变动？

之前提交比较匆忙，3D 只在 CI 测试成功，我在 3090 CUDA 12.0，V100 CUDA11.8 测试都报错。CUDA 文档没有说明 cusparseCsrSetStridedBatch 可以用在 SPGEMM，应该是用不了。

目前打算每个 batch 分别计算一次乘法，tensorflow 也是这样实现：

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/sparse/sparse_mat_mul_op.cc#L467-L496

luotao1 · 2024-01-09T02:27:40Z

之前 approve 的代码不知道为啥能通过单测

是哪个commit？

MayYouBeProsperous · 2024-01-09T02:33:24Z

之前 approve 的代码不知道为啥能通过单测

是哪个commit？

是这个：c8ff3d5

luotao1 · 2024-01-09T02:44:38Z

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/59890(develop)

从历史记录看，12月20日前是成功的，后面是不是有些代码上的改动呢

MayYouBeProsperous · 2024-01-09T15:52:42Z

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/59890(develop)
从历史记录看，12月20日前是成功的，后面是不是有些代码上的改动呢

我用12月20日之前通过单测的代码再次提交，也都报错了：#60640 #60555

CI 的测试显卡都是 V100 吗？

不过这个问题应该不影响后续开发，我现在换了实现方式。

luotao1 · 2024-01-10T02:25:11Z

2024-01-10 09:17:41 [check_op_benchmark_result.py:94] [INFO] parameters:
2024-01-10 09:17:41 [check_op_benchmark_result.py:96] [INFO] 	x (Variable) - dtype: float16, shape: [4, 512, 512, 1]
2024-01-10 09:17:41 [check_op_benchmark_result.py:96] [INFO] 	perm (list): [1, 2, 0, 3]
2024-01-10 09:17:41 [check_op_benchmark_result.py:159] [ERROR] Check speed result with case "matmul_10 (backward)" failed.
2024-01-10 09:17:41 [check_op_benchmark_result.py:159] [ERROR] Check speed result with case "matmul_10 (forward)" failed.
2024-01-10 09:17:41 [check_op_benchmark_result.py:159] [ERROR] Check speed result with case "transpose_6 (backward)" failed.

op-benchmark的速度变慢了

MayYouBeProsperous · 2024-01-10T09:09:50Z

op-benchmark的速度变慢了

通过了

zhwesky2010 · 2024-01-11T07:39:02Z

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/59890(develop) 从历史记录看，12月20日前是成功的，后面是不是有些代码上的改动呢

我用12月20日之前通过单测的代码再次提交，也都报错了：#60640 #60555

CI 的测试显卡都是 V100 吗？

不过这个问题应该不影响后续开发，我现在换了实现方式。

这个问题可能是近期的CUDA版本进行了升级，支持了CUDA11.8以上的流水线，而之前有些单测在11.8以上才会跑

zhwesky2010

LGTM

zhwesky2010 · 2024-01-11T07:19:47Z

paddle/phi/kernels/funcs/sparse/sparse_blas_impl.cu.h

+  const int64_t b_cols = b_dim_vec[b_ndims - 1];
+
+  // cusparseSpGEMM only support 32-bit indices.
+  DenseTensor a_crows_int, a_cols_int, b_crows_int, b_cols_int;


这几个临时Tensor感觉写到else分支里更好

好的已修改

* [DimExpr] DimExpr support hash (PaddlePaddle#60471) * open warning with `paddle.utils.deprecated` (PaddlePaddle#60458) * open_warning * update unittest * update * fix typos * fix warning in test runner * uncomment * cleanup todo * using VisibleDeprecationWarning * update comment * fix typo * fix indentation * fix * fix * fix indent level and test * update --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * [AutoParallel] Auto Trans PP to VPP (PaddlePaddle#60467) * [AutoParallel] Auto Trans PP to VPP * add comment * 【PIR OpTest Fix No.23】 fix test_distribute_fpn_proposals_op (PaddlePaddle#60335) * fix * fix * fix test_lookup_table_v2_bf16_op (PaddlePaddle#60332) * Fix shape error in combined-indexing setitem (PaddlePaddle#60447) * add ut * fix shape error in combine-indexing * fix ut * [auto parallel] Add pp lazy init, bug fix for xavier (PaddlePaddle#60441) * [PIR] add slice_array_dense api (PaddlePaddle#60433) * fix * fix * Set value with scalar (PaddlePaddle#60452) * set_value with scalar * fix ut * [PIR]Support custom op in PIR (PaddlePaddle#59790) * support custom op in pir * fix compile bugs * fix bugs * delete code * fix windows bugs * fix windows bugs * add symbol to paddle lib * fix windows bugs * revert code * fix bugs * fix bugs * perfect code according comment * fix py3 * revert third party * fix bugs * fix bug * fix compile bugs * fix windows * [Prim][PIR] support roll, gather, scatter, scatter_nd_add op backward in pir prim (PaddlePaddle#60481) * prim gather op backward * prim scatter op backward * prim roll op backward * prim scatter_nd op backward * [PIR] delete dense_tensor mem_desc_ (PaddlePaddle#60024) * delete dense_tensor mem_desc_ * [PIR] Complement op defs (PaddlePaddle#60475) * complement translation of legacy matmul * Complement op mappings in translation for deformable_conv_v1. * [pir]Supporting constant_folding_pass for train (PaddlePaddle#60355) * [pir]Supporting constant_folding_pass for train * fix * Update constant_folding_pass.cc * [Dynamic Shape] Fuse shape ops into generate shape op pass (PaddlePaddle#60490) * add shape.generate_shape op * rename shape.generate_shape to cinn_op.generate_shape * refactor GenerateShapeOp::SymbolBinding * move GenerateShapeOp related helper functions into generate_shape_util.cc * minor fix * minor fix * backup * refine signature of ConvertDimExprToAttribute * minor fix for signature of ConvertDimExprToAttributes * remove SubstituteDimExpr from generate_shape_util.h * Fix compile error * Fix unittest compile error * Code format * Code format * Fix _hiden_size to _hidden_size (PaddlePaddle#60485) * [DimExpr] Add substitute DimExpr util (PaddlePaddle#60493) * add SubstituteDimExpr * Fix compile error * Code format * Polish DimExprUtilTest * Change namesapce * Fix unittest * Polish DimExprUtilTest * [xpu]add sine_pos fuse pass and sine_pos xpu kernel (PaddlePaddle#60025) * add split with variable in factors and rewrite vectorize,unroll,bind error handling mechanism (PaddlePaddle#60449) * [CodeStyle] Fix regression of Ruff in sot (PaddlePaddle#60483) * support cast op from FP32 to low precision (PaddlePaddle#60385) * test=document_fix (PaddlePaddle#60399) * [XPU] refine flash attention ut (PaddlePaddle#60474) * [XPU] refine flash attention ut * refine tolerance * [Inference] support collect shape in sub block (PaddlePaddle#60451) * support collect shape in sub block * udpate * udpate * fix process mesh incorrect set in converter (PaddlePaddle#60504) * 【CMake opt No.13】Remove CINN DEPS in test/cpp/pir/shape_dialect/CMakeLists.txt (PaddlePaddle#60517) * Update CMakeLists.txt * Apply suggestions from code review * Apply suggestions from code review * Update CMakeLists.txt * Update CMakeLists.txt * 【pir】 add tensorarray op createarrylike, add_n (PaddlePaddle#60460) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify exe bug * modify kernel choose --------- Co-authored-by: winter-wang <1030748926@qq.com> * Add align iter space tactic (PaddlePaddle#60498) Add align iter space tactic * [Dynamic Shape] Add helper function MakeGenerateShapeOpAttribute (PaddlePaddle#60512) * add helper function MakeGenerateShapeOpAttribute * fix complier complaint * Code format * [Prim][PIR] Set prim gflag for pure cpp (PaddlePaddle#60505) * inference support decomp * polish code * add decomp base define * add decomp base define2 * change decomp infer * fix symbol overload * fix test case * debug * debug * decomp add debug info * add cpp flag * revert * remove unused flag * [PIR] Refine and fix pir exe (PaddlePaddle#60443) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * update 2023 security advisory, test=document_fix (PaddlePaddle#60527) * [Inference] refine common/*.h for inference lib (PaddlePaddle#60513) * 【complex op】No.19 add complex support for triangular_solve (PaddlePaddle#59529) * fix reshard dist_attr (PaddlePaddle#60535) * 【auto parallel】剔除切分推导相关的头文件对proto 的依赖 (PaddlePaddle#60543) * decouple proto * format * format * strcuct pre def * [PIR] Support Operation::Clone Interface (PaddlePaddle#60536) * [PIR] Support Operation::Clone Interface * modify into shared_ptr * [Dynamic Shape] Add FullyInsertBroadcastPass and Broadcast Op (PaddlePaddle#60511) * add ShapeBroadcastOp * add pass FullyInsertBroadcastPass * InferSymbolicShape of BroadcastShape Op * Delete unit test * Fix return error * Code format * Fix error message * Update paddle/cinn/hlir/dialect/operator/transforms/fully_insert_broadcast_pass.cc Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> --------- Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> * Fix OpTranslatorTest name (PaddlePaddle#60518) * fix name * fix name * fix name * fix name * [PIR] migrate DataFeeder into pir (PaddlePaddle#60434) * 【PIR API adaptor No.90,92】Migrate some ops into pir (PaddlePaddle#59801) * [DimExpr] Convert Broadcast to BroadcastTree (PaddlePaddle#60440) * backup BroadcastTree * add SubstituteDimExpr * add helper function ConstructBroadcastTree * Fix compile error * Code format * Polish DimExprUtilTest * Add cmake file * Change namesapce * Fix compile error * Fix unittest * reconstruct BroadcastTree * Polish DimExprUtilTest * Reconstruct BroadcastTree * Finish BroadcastBranch * Finish BroadcastBranch * Finish BroadcastBranch * Add Unittest * Remove unnecessary dim_expr_util * Add header file * [Dynamic Shape] Erase expand (PaddlePaddle#60525) * EraseExpandOp * minor fix * minor fix * Code format * [inference] Support wint4 groupwise with cutlass gemm (PaddlePaddle#60422) * support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise * fix build bug * add unit_test && fix bug * delete useless code * fix ci build bug * fix ci && optimize * fix merge conflict * add op change info * fix weight_only_linear_pass * fix format * solve ci unit_test * init * support cutlass gemm with groupwise * add unit test * fix strange bug * delete random bug * fix sm70 build bug * try to fix ci build bug * fix bug * fix volta build bug * skip sm70 in groupwise mode * change cutlass branch * simplify extent of loop after fuse and add corresponding test case (PaddlePaddle#60538) * fix bug of put_along_axis (PaddlePaddle#60551) * remove clearPass to allow custom device use fusion under fp16 (PaddlePaddle#60541) * fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePaddle#60544) * fix vs2017 limit (PaddlePaddle#60528) * 【Hackathon 5th No.20】为 Paddle 新增 Exponential 和 Gamma API (PaddlePaddle#57899) * add exponential * add gamma distribution * refine docs * add kl_divergence and test * resolve conflicts * resolve conflicts * fix bug * refine test * fix test timeout * refine code * add standard_gamma kernel * fix comments * fix tests * fix tests * fix comments * fix tests * fix gamma grad * fix yaml * fix bugs * fix tests * fix standard_gamma_grad * fix test * fix test * add cdf & icdf * add cdf & icdf * refine comments * fix * fix * fix head file * fix * fix cuda op * fix * fix * refine test * fix test * refine comments * fix comments * fix * fix * fix type check * fix docs * delete useless comments * [CINN] Add IntrinsicOps into ir_codes_collector (PaddlePaddle#60556) This PR fixed a bug of running Resnet PaddleClas. The bug is due to vectorize introduce an intrinsic GetAddr and we didn't collect the tensor of GetAddr in ir_node_collector, this would caused tensor alias won't create in cuda code. TODO: we may modify IntrinsicOp in the near future * 【auto parallel】custom op spmd rule register (PaddlePaddle#60509) * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * polish * 【AutoParallel】Add master grad in AMP-O2 of AutoParallel (PaddlePaddle#59987) * add master_grad in auto-parallel * reset third_party * fix coverage * support bf16 master_grad * fix bug in master_grad * change code according to review * change the way to find optimizer op * [Dy2St] Fix `NameloadJstTransformer` missing transform call kwargs (PaddlePaddle#60515) --------- Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com> * cinn(backends): generate infer shape kernel to infer shape of output tensor (PaddlePaddle#60519) 通过二维指针来返回后端infer shape的结果。生成的cinn ir如下。tensor_shape_args是一个二维指针。 infer_shape_set_value(0, 0, S1, tensor_shape_args) 表示将第0个output tensor的第0维设置为S1。 * fix tensor math method inplace converter (PaddlePaddle#60546) * [xpu]Add vis_decoder_attention_xpu_pass && modify qkv_attention_xpu_kernel (PaddlePaddle#60361) * [Prim][PIR] support abs, instance_norm op backward in prim pir (PaddlePaddle#60444) * abs op backward * add test case * update code * update code * update code * update code * update code * instance_norm op backward * add instance_norm_v2 test cast * custom op * [PIR] remove log simply name mechnism from phi to common. (PaddlePaddle#60507) * [InferSymbolicShape] Delete redundent value_id_to_shapeordata_ (PaddlePaddle#60554) * 【Hackathon 5th No.25】add gammaln api (PaddlePaddle#60553) * fix (PaddlePaddle#60570) * [CINN] Add tile tactic and bind cuda tactic (PaddlePaddle#60534) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * 【PIR OpTest Fix No.8】 fix test_shuffle_batch_op (PaddlePaddle#59631) * fix test_shuffle_batch_op * fix * 【PIR OpTest Fix No.14】 fix test_nce (PaddlePaddle#60255) * fix test_nce * fix test_nce * Update ops.yaml * fix * Update utils.cc * Update ops.yaml * 【PIR OpTest Fix No.19】 fix test_ftrl_op (PaddlePaddle#60329) * fix test_ftrl_op * fix * [auto parallel] Lazy init for MP. Add reshard infer shape. (PaddlePaddle#60563) * [PIR] Add unittest for Operation::Clone and Group::Clone (PaddlePaddle#60577) * [PIR] dce pass disable custom op (PaddlePaddle#60578) * [Inference] Fix bug of RunWithExternalStream API in new executor (PaddlePaddle#60122) * fix bug of RunWithExternalStream API in new executor * add test * fix bug of RunWithExternalStream API in new executor * reset flage in RunWithExternalStream * fix bug * add param swith_stream * fix bug * modify python api * fix bug * Resubmit PR-58859 (PaddlePaddle#60310) * allow multiple rng state in generator * Fix 60142; Fix some comments from sneaxiy * Overwrite copy constructors * add api * pre-commit * tensor_array slice in PIR (PaddlePaddle#60503) * use slice_array, now will meet error of destory opresult still in use * disable the pir test until the bug fixed * Set DistModel state_dict keys to structure_names (PaddlePaddle#60478) * exclude xpu * check structure name mapping * test pp * polish * support dynamic save static load * support dygraph save static load * polish * polish * use structured_name as key in DistModel state_dict * polish * polish * fix checkpoint path conflict * test get_rank_to_files * static save dynamic load test * fix sm75 build bug (PaddlePaddle#60583) * replace LOG(INFO) with VLOG(6) * Add CanProveDivisible for symbolic calculation (PaddlePaddle#60572) * add CanProveDivisible for symbolic calculation * delete extra cout for debug * fix according to some comments * [PIR][DynamicShape] make shape pass default and fix some bugs (PaddlePaddle#60548) att, make shape pass default and fix some bugs * Fix words (PaddlePaddle#60603) * 【auto parallel】custom op use spmd rule (PaddlePaddle#60571) * custom op use smpd rule * custom op use smpd rule * [auto parallel] add lazy init ut to llama (PaddlePaddle#60585) * 【pir】 modify array_write and array_read vjp , add a simple while with array_write (PaddlePaddle#60575) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify array_write vjp * modify array_write vjp * Update paddle/fluid/pybind/manual_static_op_function.h * modify array_write vjp * modify ci bug * modify * modify * Update test/legacy_test/test_while_loop_op.py * modify inplace array_read * Update test/legacy_test/test_while_op.py * Update test/ir/pir/test_while_api.py --------- Co-authored-by: winter-wang <1030748926@qq.com> * [Prim][PIR] add leaky_relu, sigmoid, instance_norm op forward prim (PaddlePaddle#60564) * hardswish op prim sink * hardswish op prim * add composite * add leaky_relu, sigmoid op forward prim * remove hardswish op forward * add instance_norm op forward prim * [CINN]Add bucket context (PaddlePaddle#60549) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * [CINN] Add bucket contexts * fix group output args bug * Add CUDNNv8 max pooling (PaddlePaddle#59413) * Add CUDNNv8 version of pool2d * Minor fix * Fix build failure * Remove dygraph API * Fix CI failure * Fix CI failure * Fix timeout * Fix timeout * Add comments * Minor fix * update lbfgs to avoid the randomness caused by paddle.dot() temporarily (PaddlePaddle#60591) * update lbfgs to avoid the randomness caused by paddle.dot() temporarily * add note * set_pir_tests_properties for some tests (PaddlePaddle#60401) * fix * Update CMakeLists.txt * Update pir_op_test_white_list * Update pir_op_test_white_list * Update pir_op_test_white_list * Add tests to whitelist (PaddlePaddle#60522) * fix * add * fix double grad without convert inplace (PaddlePaddle#60614) * fix fleetutil get_online_pass_interval bug3 (PaddlePaddle#60615) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * [PIR][DynamicShape] Add an example for broadcast in dynamic shape infer (PaddlePaddle#60608) * Add an example for broadcast in dynamic shape infer * fix_convert_all_blocks (PaddlePaddle#60613) * fix_convert_all_blocks * [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) * fix (PaddlePaddle#60625) * [PIR] Support Region Clone in Operation::Clone (PaddlePaddle#60590) * deg2rad test passed (PaddlePaddle#60619) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size (PaddlePaddle#60623) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size * fix padding_size * fix pooling_type * [SOT] move_gpu_pinned_to_gpu (PaddlePaddle#60395) * PIR API adaptor No.35、40】 Migrate paddle.nn.ChannelShuffle/ClipGradByNorm into pir (PaddlePaddle#60445) * fix some bugs * fix bugs * Update clip.py * Update test_channel_shuffle.py * Update test_clip_by_norm_op.py * Update test_clip_by_norm_op.py * add param name for dist_tensor parameter (PaddlePaddle#60574) * Fix (PaddlePaddle#60631) * [PIR] Reify InferSymbolicShapeInterface (PaddlePaddle#60438) * Reify InferSymbolicShapeInterface * [Dynamic Shape] Remove ShapeBroadcastOp redundant codes (PaddlePaddle#60609) * [Dy2St] fix `test_grad` in PIR mode (PaddlePaddle#60621) --------- Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> * reconstruct llama ci cases (PaddlePaddle#60637) * 【AutoParallel】Unify the fp16 and bf16 in auto-parallel (PaddlePaddle#60514) * unify the fp16 and bf16 * change white_list in AMP * add dtype support * fix bug in dtype * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass (PaddlePaddle#60624) * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass * Fix compile error * Fix compile error * update pdsa-2023-019, test=document_fix (PaddlePaddle#60646) * [SOT] sot export test files (PaddlePaddle#60547) * Improve the performence of put_along_axis (PaddlePaddle#60618) * fix bug of put_along_axis * improve performence of put_along_axis * [AutoParallel] Fit vpp for gradient_merge pass (PaddlePaddle#60560) * add dist attr * add op namescope * add test_semi_auto_parallel_hybrid_strategy (PaddlePaddle#60537) * [PIR]Open uts for AdaptiveAvgPool3D (PaddlePaddle#60636) * test (PaddlePaddle#60654) * [CINN] Add OptimizeReductionTactic (PaddlePaddle#60661) * [Paddle-Trt]update set_value cmakelist (PaddlePaddle#60664) [Paddle-Trt]update set_value cmakelist * [auto parallel] fix reshape infer shape (PaddlePaddle#60632) * [CINN+PIR]Clean Old GroupScheduler logic and switch into new_group_scheduler (PaddlePaddle#60642) * [CINN]Fix HasDynamicShape Bug while Type is NULL (PaddlePaddle#60658) * [PIR] pir onednn support legact istruction and lrn (PaddlePaddle#60502) * pir onednn support legact istruction and lrn * c_softmax_with_cross_entropy support bf16 for xpu (PaddlePaddle#60472) * enable custom device to use silu_fuse_pass (PaddlePaddle#60595) move SetUseCustomDevice to all platform * [XPU] add empty_like op and test, update XHPC to 20240105 (PaddlePaddle#60617) * [XPU] update XHPC date and refine FA ut (PaddlePaddle#60598) * [XPU] update XHPC date * update comments for ut * correct adamw bf16 unit test and the way to get data type (PaddlePaddle#60565) * Fix some PADDLE_THROW error type and change test cases (PaddlePaddle#60487) * fix error type * fix TypeError fix type fix fix fix fix * fix typo * as_complex as_real check_grad (PaddlePaddle#60666) * [Fix Bug] Fix Bugs of Two Pass (PaddlePaddle#60626) * [Fix Bug] Fix Bugs of Two Pass * Fix GenerateShapeOp bug * Modify unit test * Fix MakeGetterDimExpr4SymbolName * 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API (PaddlePaddle#58092) * This PR enable offset of generator for custom device. (PaddlePaddle#60616) * [SOT] Convert dtype to `DataType` in PIR mode (PaddlePaddle#60627) * [PIR] Change output to block_arg from copy to a shared for the execution of while (PaddlePaddle#60607) * test * fix * fix * fix * 【auto parallel】custom op spmd infer add args check (PaddlePaddle#60633) * add bound check * add bound check * [PIR] Open PIR flag for test_ifelse (PaddlePaddle#60685) * open pir flag for test_ifelse * Update test_ifelse.py * Update test_ifelse.py * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass (PaddlePaddle#60669) * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass * fix index error * refine pir_all_path UT * fix bug * fix uncontiguous tensor resize bug (PaddlePaddle#60684) * fix uncontiguous tensor resize bug * [PIR]Support inplace custom op in pir (PaddlePaddle#60529) * support inplace in pir * fix inference ut * fix win bugs * fix win bug * fix * polish code * polish code * print log * print log * debug * fix win bugs * fix windows * fix (PaddlePaddle#60634) * [Docs] Update latest release version in README (PaddlePaddle#60691) * [CINN] Refine cmake for pass in cinn (PaddlePaddle#60683) * refine cmake for pass in cinn * add dependency in cmake * add dependency in cmake * [PIR]Open uts for PReLU (PaddlePaddle#60645) * [PIR]Open uts for ReLU6 (PaddlePaddle#60650) * [PIR]Open uts for RReLU (PaddlePaddle#60660) * [NPU] fix storage_properties type mismatch with OneDNN and NPU (PaddlePaddle#60566) * fix ttfnet_darknet53_1x_coco in pir mode (PaddlePaddle#60663) * [auto parallel] shard tensor stop gradient support (PaddlePaddle#60699) * [PIR][DynamicShape] Polish some codes (PaddlePaddle#60651) att, polish some codes * [PIR] fix onednn double reg (PaddlePaddle#60720) * fix onednn double reg * 【pir】modify add_n in while use blockarg instead of input value (PaddlePaddle#60668) * test * fix * fix * fix * modify add_n block_arg * modify increment return value * merge * modfiy whiel_op.py --------- Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> * [PIR] Open test_case ut (PaddlePaddle#60721) * fix * fix * [PIR] rename data_layout (PaddlePaddle#60678) * rename data_layout * [xpu]: check op is null (PaddlePaddle#60656) * 【Hackathon 5th No.1】为 Paddle 新增 copysign API (PaddlePaddle#57785) * add copysign op * fix codestyle * codestyle * fix test * fix std bug * merge init * merge init * merge init * add static cast * add std * static cast * static cast * copysignf * static cast to float input * float input * static cast to double input * fix * add inplace test * fix api * fix cast when grad * modify paddle.cast_ to cast_ * remove cast in python api * support fp16 && bf16 * set grad y to zero * fix en doc * support number input * add hostdevice * refactor kernel * fix nan when backward * add broadcast unit test * modify .cu * Update __init__.py * Update __init__.py * for ci test * static float * codestyle * static double * fix broadcast, try coverage * Delete paddle/phi/kernels/funcs/broadcast_function.h * remove unused * Update math.py * Update math.py * fix en doc * add test for output dtype, integer unsupported for now * update * update * fix * fix * add cast for input * fix * add pir test * fix doc * fix doc * fix doc * detail doc * adjust for MSVC * fix * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * fix doc output dtype, fix Equation * codestyle * codestyle * Update math.py --------- Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * rms_norm_infer_spmd (PaddlePaddle#60709) * [PIR]Open more tests for bernoulli and celu (PaddlePaddle#60706) * bernoulli && celu * celu test_error * [PIR]Open uts for scatter_nd_add (PaddlePaddle#60698) * [PIR]Open uts for scatter_nd_add * Fix ut * [PIR]Open uts for sinh (PaddlePaddle#60714) * [PIR]Open uts for Softshrink and Softsign (PaddlePaddle#60716) * [PIR] polish the ir_mapping implimentation. (PaddlePaddle#60675) * [PIR] fix onednn layout transform yaml format (PaddlePaddle#60680) * fix onednn layout transform yaml format * 【CINN】Complete error handler mechanism of dynamic schedule (PaddlePaddle#60718) * complete error handler mechanism of dynamic schedule * fix some output info * fix windows C++17 bug (PaddlePaddle#60736) * [XPU] fc pass and delete pass nodes check (PaddlePaddle#60314) * fix_local_windows_compile (PaddlePaddle#60682) * [PIR] fix onednn dialect name (PaddlePaddle#60665) * fix onednn dialect name * 【pir】add tesnor to array kernel etc (PaddlePaddle#60703) * merge * modfiy kernel * modify net * modify print * Fix defition definition (PaddlePaddle#60679) * cholesky and cholesky_solve tests (PaddlePaddle#60726) * [PIR]Open uts for searchsorted (PaddlePaddle#60700) * [PIR]Open uts for selu (PaddlePaddle#60702) * [PIR]Open uts for selu * Fix ut * [PIR]Open uts for sequence_mask (PaddlePaddle#60704) * [PIR] adjust pir pass log printing (PaddlePaddle#60723) * adjust pir pass log printing * update * update * update * fix compile * Fix Throughtput Throughput (PaddlePaddle#60741) * please last md (PaddlePaddle#60749) * [CINN+PIR]Fix Fetch XShape Variable logic (PaddlePaddle#60722) * [PIR][DynamicShape] Remove redundant code for shapeAnalysis and shapedTypeInterface (PaddlePaddle#60744) att, remove redundant code for shapeAnalysis and shapedTypeInterface * 【PIR Dist Op Reg No.1】 reg push_sparse_v2 (PaddlePaddle#60473) * code reg push_sparse_v2 * [Dynamic Shape] Provide operator<< For BroadcastTree (PaddlePaddle#60730) * [PIR] change IR clone to const and support clone operation successors (PaddlePaddle#60752) * support ir clone const and support clone operation successors * refine ir_mapping * refine region clone * [CINN] Refine fully_insert_broadcast_pass (PaddlePaddle#60676) * refine fully_insert_broadcast_pass * fix complie bug * fix complie * fix conflict * [PIR] einsum's inner_cache and xshape set to optional (PaddlePaddle#60748) * einsum's inner_cache and xshape set to intermediate * Update paddle/fluid/pir/dialect/operator/ir/ops.yaml --------- Co-authored-by: kangguangli <kangguangli@hotmail.com> * reduce runtime of unit-tests in windows-trt (PaddlePaddle#60731) * modify trt test to deal with Timeout * windows * [Paddle-TRT] upgrade EnqueueV2 to EnqueueV3 (PaddlePaddle#59950) * 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API (PaddlePaddle#59890) * Fix rank_relatvie rank_relative (PaddlePaddle#60770) * add graph_key to specific graph's varmap (PaddlePaddle#60567) * add graph_key to specific graph's varmap * fix inpalce case * fix inpalce case * 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel (PaddlePaddle#59847) * [Init] add fractional max pool kernel and api * [Fix] pooling.cu seed offset * [Change] remove adaptive from fractional max pool * [Change] fractional max 2d gpu pooling.cu grad * [Change] fractional max 2d gpu pooling.cu grad with dim3 * [Change] use UnchangedInferMeta * [Change] test api with uint16 * [Change] wrap test disable_static * [Change] regiester float16/bfloat16 * [Change] remove bfloat16 from cpu kernrl * [Change] test dtypes in cpu and gpu * [Change] test_fractional_max_pool3d_2d/3d timeout to 30s * [Fix] resolve conflict * [Change] win32 cannot detect bfloat16 correctly * [Change] force set_device * [Add] test random_u is None * [Change] use kernel_size for overlapping mode * [Change] clean headers * [CodeStyle] pooling * [Change] rename op * [Change] rename func without index * [Prim][PIR] Recover pir bn (PaddlePaddle#60689) * reopen bn prim pir * fix atol * decomp support batch_norm_ * fix test case * fix bug * fix code * [PIR]fc_with_special_op_fuse_pass bug fix (PaddlePaddle#60751) * bug fix update * update * delete all debug message * add code deleted wrong at last commit * delete createAutoMixedPrecisionPass in analysis_predictor.cc --------- Co-authored-by: HongyuJia <jiahongyu@baidu.com> Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: xingmingyyj <135400902+xingmingyyj@users.noreply.github.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: wanghuancoder <wanghuan29@baidu.com> Co-authored-by: kangguangli <kangguangli@hotmail.com> Co-authored-by: zhangyuqin1998 <75946871+zhangyuqin1998@users.noreply.github.com> Co-authored-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: NeroLoh <745827440@qq.com> Co-authored-by: 傅剑寒 <Xs1580802568@gmail.com> Co-authored-by: lzydev <lizhiyu02@baidu.com> Co-authored-by: tianshuo78520a <707759223@qq.com> Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com> Co-authored-by: 张春乔 <83450930+Liyulingyue@users.noreply.github.com> Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> Co-authored-by: winter-wang <1030748926@qq.com> Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com> Co-authored-by: cyber-pioneer <116002591+cyber-pioneer@users.noreply.github.com> Co-authored-by: Vigi Zhang <VigiZhang@users.noreply.github.com> Co-authored-by: zbt78 <1095497213@qq.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: Aurelius84 <zhangliujie@baidu.com> Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com> Co-authored-by: LoneRanger <836253168@qq.com> Co-authored-by: freeliuzc <lzc842650834@gmail.com> Co-authored-by: YibLiu <68105073+YibinLiu666@users.noreply.github.com> Co-authored-by: engineer1109 <jialiang.wang@xdxct.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: xuxinyi389 <104957571+xuxinyi389@users.noreply.github.com> Co-authored-by: MayYouBeProsperous <ljmhz@outlook.com> Co-authored-by: Huihuang Zheng <zhhsplendid@163.com> Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com> Co-authored-by: 6clc <chaoliu.lc@foxmail.com> Co-authored-by: Terry <38135104+TR666@users.noreply.github.com> Co-authored-by: winter-wang <78149749+winter-wang@users.noreply.github.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: Frank Lin <eee4017@gmail.com> Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com> Co-authored-by: lanxianghit <47554610+lanxianghit@users.noreply.github.com> Co-authored-by: Tian Zheng <tizheng@nvidia.com> Co-authored-by: lijialin03 <124568209+lijialin03@users.noreply.github.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: zhink <33270771+zhink@users.noreply.github.com> Co-authored-by: huangjiyi <43315610+huangjiyi@users.noreply.github.com> Co-authored-by: Chen Zhiyang <1792266893@qq.com> Co-authored-by: feifei-111 <2364819892@qq.com> Co-authored-by: fsczz <57291768+fsczz@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: Sonder <55493212+AndSonder@users.noreply.github.com> Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: risemeup1 <62429225+risemeup1@users.noreply.github.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: zhangyikun02 <48021248+zhangyk0314@users.noreply.github.com> Co-authored-by: Jianbang Yang <yangjianbang112@gmail.com> Co-authored-by: enzodechine <enzo9533@hotmail.com> Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com> Co-authored-by: coco <69197635+cocoshe@users.noreply.github.com> Co-authored-by: zhaohaixu <49297029+zhaohaixu@users.noreply.github.com> Co-authored-by: chen2016013 <111894720+chen2016013@users.noreply.github.com> Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> Co-authored-by: Liuyinfeng <30849840+gitliuyf@users.noreply.github.com> Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> Co-authored-by: wendaxiao <113992173+wenxiaohahaha@users.noreply.github.com> Co-authored-by: cyberslack_lee <luhputu0815@gmail.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: GGBond8488 <33050871+GGBond8488@users.noreply.github.com> Co-authored-by: megemini <megemini@outlook.com>

MayYouBeProsperous added 5 commits November 5, 2023 08:06

add cusparseSpGEMM

4641b7f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into sm

9c82903

refine paddle.sparse.matmul

782f92d

test

6c5dbb7

fix

96c9cd5

paddle-bot bot added the contributor External developers label Dec 11, 2023

luotao1 mentioned this pull request Dec 11, 2023

【PaddlePaddle Hackathon 5th】开源贡献个人挑战赛 #57262

Open

luotao1 added the PaddlePaddle Hackathon label Dec 12, 2023

luotao1 assigned luotao1 and zhwesky2010 Dec 12, 2023

MayYouBeProsperous added 2 commits December 12, 2023 05:29

coo*coo

044ba38

add grad op

ace8c96

MayYouBeProsperous changed the title ~~[WPI]【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API~~ 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API Dec 13, 2023

MayYouBeProsperous added 2 commits December 14, 2023 03:28

fix

ff40558

fix

03a08cc

zhwesky2010 reviewed Dec 19, 2023

View reviewed changes

MayYouBeProsperous added 4 commits December 19, 2023 06:59

fix 3d

14deb87

codestyle

a2019d1

fix

67bb445

fix

0ee3545

MayYouBeProsperous requested a review from zhwesky2010 December 20, 2023 02:42

zhwesky2010 reviewed Dec 21, 2023

View reviewed changes

use CastCsrKernel

c8ff3d5

zhwesky2010 previously approved these changes Dec 22, 2023

View reviewed changes

optimize int32 index

253ca38

MayYouBeProsperous added 3 commits December 30, 2023 08:05

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into sm

0851dd3

fix

8dfe895

fix

9c80efe

PaddlePaddle locked and limited conversation to collaborators Jan 2, 2024

PaddlePaddle unlocked this conversation Jan 2, 2024

MayYouBeProsperous added 2 commits January 2, 2024 06:32

fix cuda<11.8

477dc5c

ci

ea71c64

zhwesky2010 reviewed Jan 8, 2024

View reviewed changes

MayYouBeProsperous added 5 commits January 9, 2024 16:07

fix batched computation

4397dbb

fix

28449d4

fix

76e5ed9

fix cuda version check

ee0fee5

fix

0d23935

fix bugs

e01251b

zhwesky2010 previously approved these changes Jan 11, 2024

View reviewed changes

fix

1f34f1c

MayYouBeProsperous dismissed zhwesky2010’s stale review via 1f34f1c January 11, 2024 09:45

zhwesky2010 approved these changes Jan 12, 2024

View reviewed changes

luotao1 merged commit dab5512 into PaddlePaddle:develop Jan 12, 2024
29 checks passed


		sp_x = origin_x.detach().to_sparse_coo(len(x_shape))

		# only support 32-bit index.

【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API #59890

【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API #59890

Conversation

MayYouBeProsperous commented Dec 11, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 11, 2023

MayYouBeProsperous commented Dec 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MayYouBeProsperous Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MayYouBeProsperous Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MayYouBeProsperous Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MayYouBeProsperous commented Dec 21, 2023 • edited Loading

zhwesky2010 commented Dec 22, 2023

zhwesky2010 Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

zhwesky2010 Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-ci-bot bot commented Dec 29, 2023

MayYouBeProsperous commented Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

MayYouBeProsperous Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 commented Jan 8, 2024

MayYouBeProsperous commented Jan 8, 2024 • edited Loading

luotao1 commented Jan 9, 2024

MayYouBeProsperous commented Jan 9, 2024

luotao1 commented Jan 9, 2024 • edited Loading

MayYouBeProsperous commented Jan 9, 2024

luotao1 commented Jan 10, 2024

MayYouBeProsperous commented Jan 10, 2024

zhwesky2010 commented Jan 11, 2024

zhwesky2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MayYouBeProsperous commented Dec 11, 2023 •

edited

Loading

MayYouBeProsperous Dec 19, 2023 •

edited

Loading

MayYouBeProsperous Dec 19, 2023 •

edited

Loading

MayYouBeProsperous Dec 19, 2023 •

edited

Loading

MayYouBeProsperous commented Dec 21, 2023 •

edited

Loading

zhwesky2010 Dec 22, 2023 •

edited

Loading

zhwesky2010 Dec 22, 2023 •

edited

Loading

MayYouBeProsperous commented Jan 4, 2024 •

edited

Loading

MayYouBeProsperous Jan 10, 2024 •

edited

Loading

MayYouBeProsperous commented Jan 8, 2024 •

edited

Loading

luotao1 commented Jan 9, 2024 •

edited

Loading