fix record event for operator type in new dygraph #44582

rainyfly · 2022-07-25T07:32:26Z

PR types

Others

PR changes

Others

Describe

新动态图里记录Operator性能数据的打点外围也包裹了很多打点，这些外围的打点都被标记成了Operator类型，导致打印出来的算子表单严重冗余，并且会因为干扰没有办法获取真实的最大耗时op。现在通过将如下外围打点给标记为UserDefined类型进行修复。

通过跑PaddleDetection的yolov3_mobilenet_v1_roadsign.yml任务测试输出的算子表单如下：

----------------------------------------------------------------Operator Summary----------------------------------------------------------------
Time unit: ms
----------------------------------------------------  ------  ----------------------------------------  ----------------------------------------  
Name                                                  Calls   CPU Total / Avg / Max / Min / Ratio(%)    GPU Total / Avg / Max / Min / Ratio(%)    
----------------------------------------------------  ------  ----------------------------------------  ----------------------------------------  
-----------------------------------------------------------Thread: All threads merged-----------------------------------------------------------
Conv2dGradNodeFinal                                   296     195.39 / 0.66 / 1.17 / 0.18 / 13.89       622.99 / 2.10 / 4.79 / 0.24 / 23.94       
  MEMSET                                              344     - / - / - / - / -                         1.12 / 0.00 / 0.02 / 0.00 / 0.18          
  void wgrad_alg0_engine<float, 128, 5, 5, 3, 3, ...  32      - / - / - / - / -                         22.94 / 0.72 / 1.61 / 0.14 / 3.68         
  void cask_cudnn::computeOffsetsKernel<true, fal...  200     - / - / - / - / -                         0.74 / 0.00 / 0.01 / 0.00 / 0.12          
  cask_cudnn::computeBOffsetsKernel(cask_cudnn::C...  200     - / - / - / - / -                         0.73 / 0.00 / 0.00 / 0.00 / 0.12          
  maxwell_scudnn_128x64_stridedB_small_nn_v0          120     - / - / - / - / -                         79.47 / 0.66 / 1.32 / 0.09 / 12.76        
  void wgrad_alg0_engine<float, 128, 6, 7, 3, 3, ...  48      - / - / - / - / -                         56.80 / 1.18 / 3.46 / 0.17 / 9.12         
  void wgrad_alg0_engine<float, 128, 6, 8, 3, 3, ...  24      - / - / - / - / -                         32.42 / 1.35 / 2.46 / 0.50 / 5.20         
  cask_cudnn::computeWgradSplitKOffsetsKernel(cas...  120     - / - / - / - / -                         0.46 / 0.00 / 0.00 / 0.00 / 0.07          
  cask_cudnn::computeWgradBOffsetsKernel(cask_cud...  120     - / - / - / - / -                         0.46 / 0.00 / 0.00 / 0.00 / 0.07          
  maxwell_scudnn_128x128_stridedB_splitK_medium_n...  120     - / - / - / - / -                         102.69 / 0.86 / 1.27 / 0.29 / 16.48       
  void cudnn::ops::scalePackedTensor_kernel<float...  16      - / - / - / - / -                         1.08 / 0.07 / 0.07 / 0.07 / 0.17          
  void cudnn::detail::dgrad_engine<float, 512, 6,...  16      - / - / - / - / -                         6.51 / 0.41 / 0.55 / 0.26 / 1.05          
  maxwell_scudnn_128x128_stridedB_small_nn_v0         80      - / - / - / - / -                         49.89 / 0.62 / 0.79 / 0.40 / 8.01         
  void cudnn::winograd::generateWinogradTilesKern...  48      - / - / - / - / -                         6.80 / 0.14 / 0.23 / 0.06 / 1.09          
  maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_...  48      - / - / - / - / -                         87.96 / 1.83 / 1.97 / 1.72 / 14.12        
  void cudnn::winograd_nonfused::winogradWgradDat...  72      - / - / - / - / -                         15.53 / 0.22 / 0.36 / 0.09 / 2.49         
  void cudnn::winograd_nonfused::winogradWgradDel...  72      - / - / - / - / -                         31.56 / 0.44 / 0.75 / 0.19 / 5.07         
  maxwell_sgemm_32x128_nt                             48      - / - / - / - / -                         48.79 / 1.02 / 1.06 / 0.93 / 7.83         
  void cudnn::winograd_nonfused::winogradWgradOut...  72      - / - / - / - / -                         14.22 / 0.20 / 0.43 / 0.04 / 2.28         
  void axpy_kernel_val<float, float>(cublasAxpyPa...  16      - / - / - / - / -                         1.64 / 0.10 / 0.14 / 0.07 / 0.26          
  maxwell_sgemm_64x64_nt                              24      - / - / - / - / -                         19.12 / 0.80 / 0.81 / 0.79 / 3.07         
  void cudnn::winograd::generateWinogradTilesKern...  24      - / - / - / - / -                         0.41 / 0.02 / 0.02 / 0.02 / 0.07          
  maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_...  24      - / - / - / - / -                         41.65 / 1.74 / 1.76 / 1.72 / 6.69         
sync_batch_norm dygraph                               376     32.75 / 0.09 / 0.49 / 0.07 / 2.33         521.43 / 1.39 / 7.58 / 0.13 / 20.04       
  sync_batch_norm compute                             376     21.44 / 0.06 / 0.09 / 0.05 / 65.47        521.43 / 1.39 / 7.58 / 0.13 / 100.00      
    void phi::KeLocalStats<float, 256, (paddle::e...  376     - / - / - / - / -                         62.65 / 0.17 / 0.86 / 0.01 / 12.02        
    void phi::KeSyncAndMovingStats<float>(paddle:...  376     - / - / - / - / -                         2.15 / 0.01 / 0.01 / 0.00 / 0.41          
    void phi::KeNormAffine<float, (paddle::experi...  376     - / - / - / - / -                         456.63 / 1.21 / 6.71 / 0.11 / 87.57       
  sync_batch_norm node_creation                       376     4.58 / 0.01 / 0.02 / 0.01 / 13.98         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
SyncBatchNormGradNodeFinal                            376     28.69 / 0.08 / 0.13 / 0.06 / 2.04         421.23 / 1.12 / 6.17 / 0.12 / 16.18       
  sync_batch_norm_grad compute                        376     15.70 / 0.04 / 0.09 / 0.03 / 54.73        421.23 / 1.12 / 6.17 / 0.12 / 100.00      
    void phi::KeBackwardLocalStats<float, 256, (p...  376     - / - / - / - / -                         128.51 / 0.34 / 1.83 / 0.04 / 30.51       
    void phi::KeBNBackwardScaleBias<float, 256, (...  376     - / - / - / - / -                         125.86 / 0.33 / 1.82 / 0.03 / 29.88       
    void phi::KeBNBackwardData<float, (paddle::ex...  376     - / - / - / - / -                         166.86 / 0.44 / 2.53 / 0.04 / 39.61       
conv2d dygraph                                        296     115.38 / 0.39 / 0.65 / 0.24 / 8.20        341.94 / 1.16 / 5.38 / 0.09 / 13.14       
  conv2d node_creation                                296     2.14 / 0.01 / 0.02 / 0.01 / 1.85          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  void cask_cudnn::computeOffsetsKernel<false, fa...  176     - / - / - / - / -                         0.63 / 0.00 / 0.01 / 0.00 / 0.18          
  maxwell_scudnn_128x32_relu_medium_nn_v1             8       - / - / - / - / -                         1.99 / 0.25 / 0.25 / 0.25 / 0.58          
  maxwell_sgemm_64x64_nn                              40      - / - / - / - / -                         19.94 / 0.50 / 1.20 / 0.13 / 5.83         
  maxwell_sgemm_128x32_nn                             8       - / - / - / - / -                         0.79 / 0.10 / 0.10 / 0.09 / 0.23          
  void cudnn::winograd::generateWinogradTilesKern...  48      - / - / - / - / -                         6.27 / 0.13 / 0.25 / 0.02 / 1.83          
  maxwell_scudnn_winograd_128x128_ldg1_ldg4_mobil...  48      - / - / - / - / -                         153.01 / 3.19 / 5.13 / 1.96 / 44.75       
  maxwell_scudnn_128x64_relu_interior_nn_v1           104     - / - / - / - / -                         61.09 / 0.59 / 1.25 / 0.12 / 17.86        
  void cudnn::winograd::generateWinogradTilesKern...  24      - / - / - / - / -                         1.55 / 0.06 / 0.07 / 0.06 / 0.45          
  maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_...  24      - / - / - / - / -                         48.56 / 2.02 / 2.07 / 1.99 / 14.20        
  maxwell_scudnn_128x64_relu_small_nn_v1              48      - / - / - / - / -                         38.06 / 0.79 / 0.89 / 0.48 / 11.13        
  maxwell_scudnn_128x128_relu_medium_nn_v1            8       - / - / - / - / -                         5.28 / 0.66 / 0.67 / 0.65 / 1.54          
  maxwell_scudnn_128x32_relu_small_nn_v1              8       - / - / - / - / -                         4.78 / 0.60 / 0.61 / 0.59 / 1.40          
DepthwiseConv2dGradNodeFinal                          104     6.58 / 0.06 / 0.09 / 0.06 / 0.47          239.45 / 2.30 / 4.10 / 1.16 / 9.20        
  depthwise_conv2d_grad compute                       104     4.80 / 0.05 / 0.05 / 0.04 / 72.87         232.81 / 2.24 / 4.10 / 1.16 / 97.23       
    void Eigen::internal::EigenMetaKernel<Eigen::...  208     - / - / - / - / -                         23.15 / 0.11 / 0.75 / 0.00 / 9.94         
    void paddle::operators::math::KernelDepthwise...  72      - / - / - / - / -                         38.00 / 0.53 / 1.16 / 0.23 / 16.32        
    void paddle::operators::math::KernelDepthwise...  72      - / - / - / - / -                         113.23 / 1.57 / 2.02 / 1.35 / 48.64       
    void paddle::operators::math::KernelDepthwise...  32      - / - / - / - / -                         27.12 / 0.85 / 1.85 / 0.28 / 11.65        
    void paddle::operators::math::KernelDepthwise...  32      - / - / - / - / -                         31.30 / 0.98 / 1.50 / 0.78 / 13.45        
  void axpy_kernel_val<float, float>(cublasAxpyPa...  16      - / - / - / - / -                         6.64 / 0.42 / 0.56 / 0.27 / 2.77          
ReluGradNodeFinal                                     216     6.60 / 0.03 / 0.05 / 0.02 / 0.47          115.71 / 0.54 / 2.31 / 0.07 / 4.45        
  relu_grad compute                                   216     3.21 / 0.01 / 0.03 / 0.01 / 48.61         115.71 / 0.54 / 2.31 / 0.07 / 100.00      
    void phi::funcs::VectorizedElementwiseKernel<...  216     - / - / - / - / -                         115.71 / 0.54 / 2.31 / 0.07 / 100.00      
relu dygraph                                          216     6.23 / 0.03 / 0.06 / 0.02 / 0.44          77.51 / 0.36 / 1.54 / 0.05 / 2.98         
  relu compute                                        216     3.98 / 0.02 / 0.04 / 0.02 / 63.86         77.51 / 0.36 / 1.54 / 0.05 / 100.00       
    void phi::funcs::VectorizedElementwiseKernel<...  216     - / - / - / - / -                         77.51 / 0.36 / 1.54 / 0.05 / 100.00       
  relu node_creation                                  216     0.68 / 0.00 / 0.01 / 0.00 / 10.88         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
depthwise_conv2d dygraph                              104     4.17 / 0.04 / 0.06 / 0.04 / 0.30          55.37 / 0.53 / 1.16 / 0.18 / 2.13         
  depthwise_conv2d compute                            104     2.41 / 0.02 / 0.04 / 0.02 / 57.67         55.37 / 0.53 / 1.16 / 0.18 / 100.00       
    void paddle::operators::math::KernelDepthwise...  72      - / - / - / - / -                         37.94 / 0.53 / 1.16 / 0.23 / 68.52        
    void paddle::operators::math::KernelDepthwise...  32      - / - / - / - / -                         17.43 / 0.54 / 1.13 / 0.18 / 31.48        
  depthwise_conv2d node_creation                      104     0.63 / 0.01 / 0.01 / 0.00 / 15.03         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
LeakyReluGradNodeFinal                                160     5.35 / 0.03 / 0.04 / 0.02 / 0.38          37.42 / 0.23 / 0.58 / 0.03 / 1.44         
  leaky_relu_grad compute                             160     2.60 / 0.02 / 0.03 / 0.01 / 48.53         37.42 / 0.23 / 0.58 / 0.03 / 100.00       
    void phi::funcs::VectorizedElementwiseKernel<...  160     - / - / - / - / -                         37.42 / 0.23 / 0.58 / 0.03 / 100.00       
slice dygraph                                         608     42.74 / 0.07 / 3.37 / 0.02 / 3.04         29.04 / 0.05 / 3.06 / 0.00 / 1.12         
  slice compute                                       600     10.80 / 0.02 / 0.03 / 0.01 / 25.26        4.60 / 0.01 / 0.04 / 0.00 / 15.84         
    void Eigen::internal::EigenMetaKernel<Eigen::...  96      - / - / - / - / -                         0.47 / 0.00 / 0.01 / 0.00 / 10.22         
    void Eigen::internal::EigenMetaKernel<Eigen::...  96      - / - / - / - / -                         0.26 / 0.00 / 0.00 / 0.00 / 5.69          
    void Eigen::internal::EigenMetaKernel<Eigen::...  408     - / - / - / - / -                         3.87 / 0.01 / 0.04 / 0.00 / 84.09         
  slice node_creation                                 200     1.04 / 0.01 / 0.02 / 0.00 / 2.44          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  GpuMemcpySync:CUDAPinned->GPU                       8       0.26 / 0.03 / 0.03 / 0.03 / 0.61          0.01 / 0.00 / 0.00 / 0.00 / 0.04          
    MEMCPY_HtoD                                       8       - / - / - / - / -                         0.01 / 0.00 / 0.00 / 0.00 / 100.00        
leaky_relu dygraph                                    160     4.57 / 0.03 / 0.04 / 0.03 / 0.32          24.93 / 0.16 / 0.39 / 0.02 / 0.96         
  leaky_relu compute                                  160     3.02 / 0.02 / 0.03 / 0.02 / 66.09         24.93 / 0.16 / 0.39 / 0.02 / 100.00       
    void phi::funcs::VectorizedElementwiseKernel<...  160     - / - / - / - / -                         24.93 / 0.16 / 0.39 / 0.02 / 100.00       
  leaky_relu node_creation                            160     0.50 / 0.00 / 0.00 / 0.00 / 10.90         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
slice                                                 8       26.23 / 3.28 / 3.33 / 3.20 / 1.86         24.42 / 3.05 / 3.06 / 3.05 / 0.94         
  GpuMemcpySync:CUDAPinned->GPU                       8       24.78 / 3.10 / 3.12 / 3.08 / 94.46        24.40 / 3.05 / 3.06 / 3.04 / 99.88        
    MEMCPY_HtoD                                       8       - / - / - / - / -                         24.40 / 3.05 / 3.06 / 3.04 / 100.00       
  infer_shape                                         8       0.08 / 0.01 / 0.01 / 0.01 / 0.30          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  compute                                             8       0.57 / 0.07 / 0.10 / 0.05 / 2.17          0.03 / 0.00 / 0.00 / 0.00 / 0.12          
    void Eigen::internal::EigenMetaKernel<Eigen::...  8       - / - / - / - / -                         0.03 / 0.00 / 0.00 / 0.00 / 100.00        
  grad_node_creation                                  8       0.00 / 0.00 / 0.00 / 0.00 / 0.01          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
subtract dygraph                                      216     7.00 / 0.03 / 0.06 / 0.02 / 0.50          11.58 / 0.05 / 0.68 / 0.00 / 0.44         
  subtract compute                                    216     4.80 / 0.02 / 0.04 / 0.02 / 68.46         11.58 / 0.05 / 0.68 / 0.00 / 100.00       
    void phi::funcs::VectorizedBroadcastKernel<fl...  216     - / - / - / - / -                         11.58 / 0.05 / 0.68 / 0.00 / 100.00       
  subtract node_creation                              168     0.97 / 0.01 / 0.01 / 0.00 / 13.80         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
concat dygraph                                        64      3.34 / 0.05 / 0.11 / 0.03 / 0.24          8.86 / 0.14 / 0.65 / 0.01 / 0.34          
  concat compute                                      64      2.29 / 0.04 / 0.09 / 0.02 / 68.71         8.86 / 0.14 / 0.65 / 0.01 / 100.00        
    void phi::funcs::ConcatKernel_<float>(float c...  24      - / - / - / - / -                         0.20 / 0.01 / 0.01 / 0.01 / 2.20          
    void phi::funcs::ConcatKernel_<float>(float c...  24      - / - / - / - / -                         0.92 / 0.04 / 0.07 / 0.02 / 10.34         
    void phi::funcs::ConcatKernel_<float>(float c...  16      - / - / - / - / -                         7.71 / 0.48 / 0.65 / 0.32 / 87.07         
  concat node_creation                                40      0.28 / 0.01 / 0.01 / 0.01 / 8.33          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
ConcatGradNodeFinal                                   16      1.43 / 0.09 / 0.10 / 0.08 / 0.10          7.65 / 0.48 / 0.64 / 0.31 / 0.29          
  concat_grad compute                                 16      0.99 / 0.06 / 0.07 / 0.06 / 69.06         7.65 / 0.48 / 0.64 / 0.31 / 100.00        
    void phi::funcs::SplitKernel_<float>(float co...  16      - / - / - / - / -                         7.62 / 0.48 / 0.64 / 0.31 / 99.56         
transpose dygraph                                     48      814.82 / 16.98 / 103.17 / 0.03 / 57.90    6.29 / 0.13 / 0.53 / 0.01 / 0.24          
  GpuMemcpySync:CUDAPinned->GPU                       24      812.45 / 33.85 / 103.12 / 0.15 / 99.71    5.04 / 0.21 / 0.48 / 0.03 / 80.10         
    MEMCPY_HtoD                                       24      - / - / - / - / -                         5.04 / 0.21 / 0.48 / 0.03 / 100.00        
  transpose compute                                   48      1.41 / 0.03 / 0.07 / 0.02 / 0.17          1.25 / 0.03 / 0.06 / 0.01 / 19.90         
    void paddle::operators::TilingSwapDim1And2<un...  16      - / - / - / - / -                         0.88 / 0.06 / 0.06 / 0.05 / 70.55         
    void paddle::operators::TilingSwapDim1And2<un...  16      - / - / - / - / -                         0.17 / 0.01 / 0.01 / 0.01 / 13.43         
    void paddle::operators::TilingSwapDim1And2<un...  16      - / - / - / - / -                         0.20 / 0.01 / 0.02 / 0.01 / 16.02         
  transpose node_creation                             24      0.07 / 0.00 / 0.01 / 0.00 / 0.01          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
reduce_prod dygraph                                   72      2.84 / 0.04 / 0.06 / 0.03 / 0.20          5.04 / 0.07 / 0.45 / 0.00 / 0.19          
  prod_raw compute                                    72      2.09 / 0.03 / 0.05 / 0.02 / 73.66         5.04 / 0.07 / 0.45 / 0.00 / 100.00        
    void phi::funcs::ReduceAnyKernel<float, float...  72      - / - / - / - / -                         5.04 / 0.07 / 0.45 / 0.00 / 100.00        
  reduce_prod node_creation                           48      0.22 / 0.00 / 0.01 / 0.00 / 7.90          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
SliceGradNodeFinal                                    144     4.73 / 0.03 / 0.05 / 0.02 / 0.34          4.96 / 0.03 / 0.09 / 0.00 / 0.19          
  slice_grad compute                                  144     1.95 / 0.01 / 0.02 / 0.01 / 41.14         1.92 / 0.01 / 0.04 / 0.00 / 38.73         
    void Eigen::internal::EigenMetaKernel<Eigen::...  144     - / - / - / - / -                         1.92 / 0.01 / 0.04 / 0.00 / 100.00        
  void axpy_kernel_val<float, float>(cublasAxpyPa...  120     - / - / - / - / -                         3.04 / 0.03 / 0.06 / 0.00 / 61.27         
clip dygraph                                          72      2.04 / 0.03 / 0.05 / 0.02 / 0.15          4.91 / 0.07 / 0.45 / 0.00 / 0.19          
  clip compute                                        72      1.46 / 0.02 / 0.04 / 0.02 / 71.51         4.91 / 0.07 / 0.45 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  72      - / - / - / - / -                         4.91 / 0.07 / 0.45 / 0.00 / 100.00        
  clip node_creation                                  48      0.13 / 0.00 / 0.00 / 0.00 / 6.57          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
maximum dygraph                                       24      1.03 / 0.04 / 0.07 / 0.04 / 0.07          4.13 / 0.17 / 0.38 / 0.04 / 0.16          
  maximum compute                                     24      0.69 / 0.03 / 0.05 / 0.02 / 66.89         4.13 / 0.17 / 0.38 / 0.04 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  24      - / - / - / - / -                         4.13 / 0.17 / 0.38 / 0.04 / 100.00        
  maximum node_creation                               24      0.14 / 0.01 / 0.01 / 0.00 / 13.62         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
minimum dygraph                                       24      0.91 / 0.04 / 0.04 / 0.03 / 0.06          4.11 / 0.17 / 0.37 / 0.04 / 0.16          
  minimum compute                                     24      0.62 / 0.03 / 0.03 / 0.02 / 68.28         4.11 / 0.17 / 0.37 / 0.04 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  24      - / - / - / - / -                         4.11 / 0.17 / 0.37 / 0.04 / 100.00        
  minimum node_creation                               24      0.10 / 0.00 / 0.00 / 0.00 / 10.66         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
add dygraph                                           352     10.91 / 0.03 / 0.05 / 0.02 / 0.78         3.74 / 0.01 / 0.16 / 0.00 / 0.14          
  add compute                                         352     7.22 / 0.02 / 0.04 / 0.02 / 66.16         3.74 / 0.01 / 0.16 / 0.00 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  352     - / - / - / - / -                         3.74 / 0.01 / 0.16 / 0.00 / 100.00        
  add node_creation                                   304     1.71 / 0.01 / 0.02 / 0.00 / 15.72         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
scale dygraph                                         440     10.26 / 0.02 / 0.06 / 0.02 / 0.73         3.73 / 0.01 / 0.23 / 0.00 / 0.14          
  scale compute                                       440     7.25 / 0.02 / 0.05 / 0.01 / 70.66         3.73 / 0.01 / 0.23 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  440     - / - / - / - / -                         3.73 / 0.01 / 0.23 / 0.00 / 100.00        
  scale node_creation                                 320     0.71 / 0.00 / 0.02 / 0.00 / 6.91          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
divide dygraph                                        24      0.81 / 0.03 / 0.04 / 0.03 / 0.06          3.68 / 0.15 / 0.35 / 0.02 / 0.14          
  divide compute                                      24      0.52 / 0.02 / 0.03 / 0.02 / 64.04         3.68 / 0.15 / 0.35 / 0.02 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  24      - / - / - / - / -                         3.68 / 0.15 / 0.35 / 0.02 / 100.00        
  divide node_creation                                24      0.15 / 0.01 / 0.02 / 0.00 / 18.50         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
nearest_interp_v2GradNodeCompat                       16      1.79 / 0.11 / 0.16 / 0.09 / 0.13          3.29 / 0.21 / 0.28 / 0.14 / 0.13          
nearest_interp_v2_grad                                16      1.35 / 0.08 / 0.12 / 0.06 / 0.10          3.29 / 0.21 / 0.28 / 0.14 / 0.13          
  infer_shape                                         16      0.04 / 0.00 / 0.00 / 0.00 / 3.31          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  compute                                             16      0.77 / 0.05 / 0.08 / 0.04 / 56.86         3.29 / 0.21 / 0.28 / 0.14 / 100.00        
    void Eigen::internal::EigenMetaKernel<Eigen::...  16      - / - / - / - / -                         0.31 / 0.02 / 0.03 / 0.01 / 9.45          
    void phi::KeNearestNeighborInterpNCHWBw<float...  16      - / - / - / - / -                         2.98 / 0.19 / 0.25 / 0.12 / 90.55         
  grad_node_creation                                  16      0.00 / 0.00 / 0.00 / 0.00 / 0.31          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
max dygraph                                           24      0.90 / 0.04 / 0.05 / 0.03 / 0.06          2.02 / 0.08 / 0.19 / 0.02 / 0.08          
  max compute                                         24      0.73 / 0.03 / 0.04 / 0.03 / 81.28         2.02 / 0.08 / 0.19 / 0.02 / 100.00        
    void phi::funcs::ReduceAnyKernel<float, float...  24      - / - / - / - / -                         2.02 / 0.08 / 0.19 / 0.02 / 100.00        
nearest_interp_v2 dygraph                             16      1.90 / 0.12 / 0.16 / 0.09 / 0.13          1.57 / 0.10 / 0.13 / 0.07 / 0.06          
  nearest_interp_v2 node_creation                     16      0.09 / 0.01 / 0.01 / 0.01 / 4.98          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
nearest_interp_v2                                     16      1.48 / 0.09 / 0.13 / 0.07 / 0.11          1.57 / 0.10 / 0.13 / 0.07 / 0.06          
  infer_shape                                         16      0.25 / 0.02 / 0.03 / 0.01 / 17.17         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  compute                                             16      0.59 / 0.04 / 0.06 / 0.03 / 39.82         1.57 / 0.10 / 0.13 / 0.07 / 100.00        
    void phi::KeNearestNeighborInterpNCHWFw<float...  16      - / - / - / - / -                         1.57 / 0.10 / 0.13 / 0.07 / 100.00        
  grad_node_creation                                  16      0.00 / 0.00 / 0.00 / 0.00 / 0.27          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
multiply dygraph                                      216     6.71 / 0.03 / 0.06 / 0.02 / 0.48          1.30 / 0.01 / 0.03 / 0.00 / 0.05          
  multiply compute                                    216     4.59 / 0.02 / 0.05 / 0.02 / 68.41         1.30 / 0.01 / 0.03 / 0.00 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  216     - / - / - / - / -                         1.30 / 0.01 / 0.03 / 0.00 / 100.00        
  multiply node_creation                              192     0.82 / 0.00 / 0.01 / 0.00 / 12.15         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
MultiplyGradNodeFinal                                 144     5.16 / 0.04 / 0.45 / 0.02 / 0.37          1.12 / 0.01 / 0.03 / 0.00 / 0.04          
  multiply_grad compute                               144     2.40 / 0.02 / 0.04 / 0.01 / 46.61         1.01 / 0.01 / 0.03 / 0.00 / 90.33         
    void phi::funcs::VectorizedBroadcastKernel<fl...  144     - / - / - / - / -                         1.01 / 0.01 / 0.03 / 0.00 / 100.00        
  void axpy_kernel_val<float, float>(cublasAxpyPa...  24      - / - / - / - / -                         0.11 / 0.00 / 0.01 / 0.00 / 9.67          
AddGradNodeFinal                                      184     6.49 / 0.04 / 0.07 / 0.02 / 0.46          1.03 / 0.01 / 0.04 / 0.00 / 0.04          
  add_grad compute                                    184     4.40 / 0.02 / 0.05 / 0.02 / 67.84         1.03 / 0.01 / 0.04 / 0.00 / 100.00        
    void phi::funcs::ReduceAnyKernel<float, float...  24      - / - / - / - / -                         0.42 / 0.02 / 0.04 / 0.01 / 40.56         
    void phi::funcs::ReduceHigherDimKernel<float,...  24      - / - / - / - / -                         0.11 / 0.00 / 0.01 / 0.00 / 10.89         
SigmoidCrossEntropyWithLogitsGradNodeFinal            48      1.36 / 0.03 / 0.04 / 0.02 / 0.10          0.85 / 0.02 / 0.05 / 0.00 / 0.03          
  sigmoid_cross_entropy_with_logits_grad compute      48      0.83 / 0.02 / 0.03 / 0.01 / 61.00         0.85 / 0.02 / 0.05 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.85 / 0.02 / 0.05 / 0.00 / 100.00        
sum dygraph                                           96      5.20 / 0.05 / 0.07 / 0.04 / 0.37          0.84 / 0.01 / 0.02 / 0.00 / 0.03          
  sum compute                                         96      4.04 / 0.04 / 0.05 / 0.03 / 77.72         0.84 / 0.01 / 0.02 / 0.00 / 100.00        
    void phi::funcs::ReduceAnyKernel<float, float...  96      - / - / - / - / -                         0.48 / 0.00 / 0.01 / 0.00 / 56.94         
    void phi::funcs::ReduceHigherDimKernel<float,...  72      - / - / - / - / -                         0.36 / 0.01 / 0.01 / 0.00 / 43.06         
  sum node_creation                                   96      0.35 / 0.00 / 0.01 / 0.00 / 6.82          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
sigmoid_cross_entropy_with_logits dygraph             48      1.62 / 0.03 / 0.06 / 0.03 / 0.12          0.70 / 0.01 / 0.04 / 0.01 / 0.03          
  sigmoid_cross_entropy_with_logits compute           48      1.07 / 0.02 / 0.05 / 0.02 / 66.24         0.70 / 0.01 / 0.04 / 0.01 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.70 / 0.01 / 0.04 / 0.01 / 100.00        
  sigmoid_cross_entropy_with_logits node_creation     48      0.25 / 0.01 / 0.01 / 0.00 / 15.38         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
TransposeGradNodeFinal                                24      0.61 / 0.03 / 0.04 / 0.02 / 0.04          0.54 / 0.02 / 0.05 / 0.01 / 0.02          
  transpose_grad compute                              24      0.39 / 0.02 / 0.02 / 0.01 / 64.36         0.54 / 0.02 / 0.05 / 0.01 / 100.00        
    void paddle::operators::TilingSwapDim1And2<un...  16      - / - / - / - / -                         0.18 / 0.01 / 0.01 / 0.01 / 33.21         
    void paddle::operators::TilingSwapDim1And2<un...  8       - / - / - / - / -                         0.36 / 0.05 / 0.05 / 0.04 / 66.79         
cast dygraph                                          144     3.35 / 0.02 / 0.04 / 0.02 / 0.24          0.50 / 0.00 / 0.01 / 0.00 / 0.02          
  cast compute                                        144     2.60 / 0.02 / 0.03 / 0.01 / 77.64         0.50 / 0.00 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  96      - / - / - / - / -                         0.37 / 0.00 / 0.01 / 0.00 / 73.94         
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.13 / 0.00 / 0.00 / 0.00 / 26.06         
SumGradNodeFinal                                      96      2.57 / 0.03 / 0.07 / 0.02 / 0.18          0.45 / 0.00 / 0.02 / 0.00 / 0.02          
  sum_grad compute                                    96      1.67 / 0.02 / 0.06 / 0.01 / 65.21         0.45 / 0.00 / 0.02 / 0.00 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  96      - / - / - / - / -                         0.45 / 0.00 / 0.02 / 0.00 / 100.00        
ScaleGradNodeFinal                                    104     2.14 / 0.02 / 0.03 / 0.02 / 0.15          0.42 / 0.00 / 0.01 / 0.00 / 0.02          
  scale compute                                       104     1.23 / 0.01 / 0.02 / 0.01 / 57.42         0.42 / 0.00 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  104     - / - / - / - / -                         0.42 / 0.00 / 0.01 / 0.00 / 100.00        
BceLossGradNodeFinal                                  48      1.21 / 0.03 / 0.05 / 0.02 / 0.09          0.39 / 0.01 / 0.02 / 0.00 / 0.01          
  bce_loss_grad compute                               48      0.58 / 0.01 / 0.04 / 0.01 / 48.33         0.39 / 0.01 / 0.02 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.39 / 0.01 / 0.02 / 0.00 / 100.00        
meshgrid dygraph                                      24      2.65 / 0.11 / 0.15 / 0.10 / 0.19          0.34 / 0.01 / 0.01 / 0.01 / 0.01          
  meshgrid compute                                    24      2.29 / 0.10 / 0.14 / 0.08 / 86.57         0.34 / 0.01 / 0.01 / 0.01 / 100.00        
    void Eigen::internal::EigenMetaKernel<Eigen::...  48      - / - / - / - / -                         0.22 / 0.00 / 0.01 / 0.00 / 65.70         
mean dygraph                                          96      5.32 / 0.06 / 0.60 / 0.04 / 0.38          0.33 / 0.00 / 0.00 / 0.00 / 0.01          
  mean compute                                        96      4.27 / 0.04 / 0.59 / 0.04 / 80.21         0.33 / 0.00 / 0.00 / 0.00 / 100.00        
    void cub::DeviceReduceSingleTileKernel<cub::D...  96      - / - / - / - / -                         0.33 / 0.00 / 0.00 / 0.00 / 100.00        
  mean node_creation                                  96      0.42 / 0.00 / 0.01 / 0.00 / 7.98          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
bce_loss dygraph                                      48      1.16 / 0.02 / 0.05 / 0.02 / 0.08          0.31 / 0.01 / 0.01 / 0.00 / 0.01          
  bce_loss compute                                    48      0.77 / 0.02 / 0.04 / 0.01 / 65.98         0.31 / 0.01 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.31 / 0.01 / 0.01 / 0.00 / 100.00        
  bce_loss node_creation                              48      0.14 / 0.00 / 0.01 / 0.00 / 12.24         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
AbsGradNodeFinal                                      48      1.21 / 0.03 / 0.05 / 0.02 / 0.09          0.30 / 0.01 / 0.01 / 0.00 / 0.01          
  abs_grad compute                                    48      0.66 / 0.01 / 0.04 / 0.01 / 55.01         0.30 / 0.01 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.30 / 0.01 / 0.01 / 0.00 / 100.00        
MeanGradNodeFinal                                     96      2.58 / 0.03 / 0.06 / 0.02 / 0.18          0.27 / 0.00 / 0.00 / 0.00 / 0.01          
  mean_grad compute                                   96      1.67 / 0.02 / 0.05 / 0.01 / 64.80         0.27 / 0.00 / 0.00 / 0.00 / 100.00        
    void phi::funcs::VectorizedBroadcastKernel<fl...  96      - / - / - / - / -                         0.27 / 0.00 / 0.00 / 0.00 / 100.00        
SigmoidGradNodeFinal                                  48      1.03 / 0.02 / 0.04 / 0.02 / 0.07          0.26 / 0.01 / 0.01 / 0.00 / 0.01          
  sigmoid_grad compute                                48      0.55 / 0.01 / 0.03 / 0.01 / 53.57         0.26 / 0.01 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.26 / 0.01 / 0.01 / 0.00 / 100.00        
sigmoid dygraph                                       48      1.19 / 0.02 / 0.04 / 0.02 / 0.08          0.24 / 0.01 / 0.01 / 0.00 / 0.01          
  sigmoid compute                                     48      0.77 / 0.02 / 0.03 / 0.01 / 65.16         0.24 / 0.01 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.24 / 0.01 / 0.01 / 0.00 / 100.00        
  sigmoid node_creation                               48      0.17 / 0.00 / 0.01 / 0.00 / 14.09         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
exp dygraph                                           48      1.24 / 0.03 / 0.04 / 0.02 / 0.09          0.21 / 0.00 / 0.01 / 0.00 / 0.01          
  exp compute                                         48      0.85 / 0.02 / 0.03 / 0.01 / 68.41         0.21 / 0.00 / 0.01 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.21 / 0.00 / 0.01 / 0.00 / 100.00        
  exp node_creation                                   48      0.15 / 0.00 / 0.00 / 0.00 / 12.12         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
abs dygraph                                           48      1.19 / 0.02 / 0.03 / 0.02 / 0.08          0.15 / 0.00 / 0.00 / 0.00 / 0.01          
  abs compute                                         48      0.79 / 0.02 / 0.02 / 0.01 / 66.53         0.15 / 0.00 / 0.00 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  48      - / - / - / - / -                         0.15 / 0.00 / 0.00 / 0.00 / 100.00        
  abs node_creation                                   48      0.13 / 0.00 / 0.00 / 0.00 / 11.04         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
stack dygraph                                         24      1.22 / 0.05 / 0.09 / 0.04 / 0.09          0.11 / 0.00 / 0.00 / 0.00 / 0.00          
  stack compute                                       24      0.96 / 0.04 / 0.07 / 0.03 / 78.73         0.11 / 0.00 / 0.00 / 0.00 / 100.00        
    void phi::StackCUDAKernel<long, int>(long**, ...  24      - / - / - / - / -                         0.08 / 0.00 / 0.00 / 0.00 / 75.61         
fill_constant dygraph                                 8       0.74 / 0.09 / 0.10 / 0.09 / 0.05          0.02 / 0.00 / 0.00 / 0.00 / 0.00          
fill_constant                                         8       0.63 / 0.08 / 0.08 / 0.07 / 0.04          0.02 / 0.00 / 0.00 / 0.00 / 0.00          
  infer_shape                                         8       0.02 / 0.00 / 0.00 / 0.00 / 3.54          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  compute                                             8       0.27 / 0.03 / 0.04 / 0.03 / 43.38         0.02 / 0.00 / 0.00 / 0.00 / 100.00        
    void phi::funcs::VectorizedElementwiseKernel<...  8       - / - / - / - / -                         0.02 / 0.00 / 0.00 / 0.00 / 100.00        
  grad_node_creation                                  8       0.00 / 0.00 / 0.00 / 0.00 / 0.32          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
GradNodeAccumulation                                  1176    4.78 / 0.00 / 0.01 / 0.00 / 0.34          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
ReshapeGradNodeFinal                                  72      0.78 / 0.01 / 0.02 / 0.00 / 0.06          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  reshape_grad compute                                72      0.11 / 0.00 / 0.00 / 0.00 / 14.40         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
SubtractGradNodeFinal                                 48      0.36 / 0.01 / 0.01 / 0.01 / 0.03          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  subtract_grad compute                               48      0.03 / 0.00 / 0.00 / 0.00 / 8.48          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
reshape dygraph                                       168     1.99 / 0.01 / 0.02 / 0.01 / 0.14          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  reshape_with_xshape compute                         168     0.25 / 0.00 / 0.01 / 0.00 / 12.36         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  reshape node_creation                               96      0.31 / 0.00 / 0.00 / 0.00 / 15.59         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
unsqueeze dygraph                                     48      0.54 / 0.01 / 0.02 / 0.01 / 0.04          0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  unsqueeze_with_xshape compute                       48      0.09 / 0.00 / 0.00 / 0.00 / 17.58         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
  unsqueeze node_creation                             24      0.10 / 0.00 / 0.00 / 0.00 / 18.80         0.00 / 0.00 / 0.00 / 0.00 / 0.00          
----------------------------------------------------  ------  ----------------------------------------  ----------------------------------------

修复导出的chrome tracing中显卡内存数据格式化字符串设置错误的bug。
用户自定义表单只统计python层用户自定义的打点。

paddle-bot · 2022-07-25T07:32:40Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu

LGTM

* fix new dygraph record event for op * update unit test

* fix record event for operator type in new dygraph (#44582) * fix new dygraph record event for op * update unit test * fix file mode

* fix python3.10 compile bug on window (PaddlePaddle#44330) * Fix random seed for several unit tests (PaddlePaddle#44135) * Fix test_functional_conv2d_transpose random seed * Fix random seed and use np.testing * Fix random seed for test_lu_unpack_op * Fix test_autograd_functional_dynamic random seed * Remove boost library (PaddlePaddle#44092) * add fused token prune op and plugin (PaddlePaddle#44281) * add fused token prune op and plugin * Fix run inference bug for standalone executor (PaddlePaddle#44340) * xpu-paddlepaddle-33 [任务] matmul单测 timeout (PaddlePaddle#44333) test=kunlun * [IPU] add custom-op UTs 0/N (PaddlePaddle#44328) * add custom-op UTs 0 * add authors Co-authored-by: Allen Guo <alleng@graphcore.ai> Co-authored-by: Zhixin Yao <zhixiny@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> Co-authored-by: Zhixin Yao <zhixiny@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * [IPU] add custom-op UTs 1/N (PaddlePaddle#44329) * add custom-op UTs 1 * add authors Co-authored-by: Allen Guo <alleng@graphcore.ai> Co-authored-by: Zhixin Yao <zhixiny@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * update url Co-authored-by: Zhixin Yao <zhixiny@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * support KL2 multi-card training, *test=kunlun (PaddlePaddle#43889) * update xccl lib * use separate streams for compute/comm on XPU * add broadcast op to xpu2_op_list * Remove auto to_pascal_case for args in op generator (PaddlePaddle#44350) * remove auto to_pascal_case for args in op generator * fix yaml config * Standard sparse conv name (PaddlePaddle#44353) * [Eager] eager variable back sync (PaddlePaddle#44343) * eager variable back sync * [ Phi Kernel ] Transfer as_real to phi. (PaddlePaddle#44263) * transfer as_real to phi * fix erros * blocking: True -> False * [Eager]Fix assert statement (PaddlePaddle#43492) * Not rename pb file to avoid re-compile (PaddlePaddle#44370) * [Phi] Migrate solve kernel to phi (PaddlePaddle#44363) * draft version * draft version * draft version * migrate solve kernel to phi * polish * polish * re useless header file, fix a bug in grad_kernel_impl * add header file in need * [auto parallel] remove comm init control (PaddlePaddle#44385) * [CustomDevice] remove unused file (PaddlePaddle#44358) * [Paddle-TRT] reshape fill_constant (PaddlePaddle#44314) * reshape fill_constant * commit * commit * set seed for uts (PaddlePaddle#44372) * [Paddle-TRT] remove useless code in fc (PaddlePaddle#44382) * remove useless code in fc * [Paddle-TRT] Fix cast (PaddlePaddle#44312) * fix_cast * fix_cast * commit * Polish jit layer cmakelists to hide some message (PaddlePaddle#44351) * Enable inference multi stream ci test (PaddlePaddle#44275) * test * update * fix bug of old pp (PaddlePaddle#44361) * add xpu resnet_unit (PaddlePaddle#44297) * add xpu resnet_unit *test=kunlun * tmp *test=kunlun * add blacklist in prim2orig interface (PaddlePaddle#44383) * [Plugin] Fix Custom device in eager mode, test=develop (PaddlePaddle#43952) * [Plugin] Fix Custom device in eager mode, test=develop * update test case, test=develop * update ut for coverage, test=develop * add ipu support for standalone executor. (PaddlePaddle#44342) * fix typos in template for codegen of operators (PaddlePaddle#44364) * fix duplicate slice logic in _grad (PaddlePaddle#44396) * [MLU] fix mlu ctest final. (PaddlePaddle#44404) * fix data transform bug of interpolate op (PaddlePaddle#44401) * [Sparse] Add sparse matmul kernel(coo*dense->dense) (PaddlePaddle#44346) * fix new autodiff api docs (PaddlePaddle#44341) * fix build error in low arch (PaddlePaddle#44391) * [new api] add new api paddle.vision.ops.distribute_fpn_proposals (PaddlePaddle#43736) * add distribute_fpn_proposals * change to new dygraph * fix doc and example code * change fluid impl to current version * update (PaddlePaddle#44418) * [Paddle-TRT] Shape sum fix scale (PaddlePaddle#44394) * shape sum * add shape, sum trt layer * [Phi] Migrate infermeta and add yaml for solve op (PaddlePaddle#44379) * migrate solve kernel to phi * re useless header file, fix a bug in grad_kernel_impl * add header file in need * add yaml for solve op * fix solve_sig.cc ArgumentMapping and update tests case * disable legacy dygraph check in op_test * rm solve_op.cc / solve_sig.cc and migrate yaml config * Update op_test.py disable legacy dygraph check when check_eager is True * add labels for infer ut (PaddlePaddle#44279) * add labels for infer ut * add RUN_TYPE=INFER for cpp ut * fix formaterror * update * Add mfence for XPU2 KP (PaddlePaddle#44258) * remove include of all.h in resnet_basic_block_op_xpu.cc, test=kunlun (PaddlePaddle#44423) * Rename BOOST_GET macros (PaddlePaddle#44368) * Rename BOOST_GET macros * Fix conflicts * [new API] add paddle.vision.ops.generate_proposals (PaddlePaddle#43611) * add generate_proposals into paddle.vision * remove class api * im_info -> img_size * change fluid impl to current version * Accelerate inference period in op Cache method (PaddlePaddle#43857) * Added pad3d and pad2d FP32 FWD oneDNN kernels (PaddlePaddle#43990) * Piotrek's changes for pad3d * my changes * first version of pad3d, single copy, unnecessary reads * optimized pad3d kernel * test upadte * removed magic numbers * added support for pad2d * reverted two files * reverted one old change * added support for Paddings tensor * CI fix * CI fix * fixed timeout of tests * fixed typo * changes to GetKernelTypeForVar * Revert "changes to GetKernelTypeForVar" This reverts commit 4691061. * added AsExtra() to pad2d Co-authored-by: Piotr Paturej <piotr.paturej@intel.com> * add save_cache/patch (PaddlePaddle#44420) * add save_cache/patch * add pybind * remove pybind * remove const_cast * add fleet * Standard name of sparse pool (PaddlePaddle#44344) * move eig operator from fluid to phi (PaddlePaddle#44398) * move eig operator from fluid to phi * add eig_grad unitest, upgrade IsComplexType() from fluid to phi * [Phi]Move angle op to phi (PaddlePaddle#44393) * Move angle op to phi * Replace mutable_data using Alloc * Remove some include * Try to fix windows ci error * include math.h to fix windows ci error * Fix kernel name * Move angle_grad infershape * [Eager]release gil when run backward (PaddlePaddle#44433) * release gil when run backward * compile phi/backends into one static library (PaddlePaddle#44373) * compile into one static library * fix xpu compile * fix xpu compile * fix inference compile * fix inference compile * add custom test * revert one file * [IPU] Add more Ops (PaddlePaddle#44414) * [IPU] Add more Ops * update boost API * Clean CI_SKIP_CPP_TEST (PaddlePaddle#44412) * Add dependency for read op in standalone executor (PaddlePaddle#44362) * Add dependency for read op in standalone executor * Fix CI errors * Add UT * add_dependency -> dependency_utils * Fix CI errors * Add distro in ci docker (PaddlePaddle#44332) * add distro zstd * test * test * add pip3.8 * [Phi] migrate as_complex kernel to phi (PaddlePaddle#44438) * migrate as_complex kernel to phi * support as_complex and as_real in phi * rm GetExpectedKernelType for AsRealOp * [GPUPS]FleetWrapper initialize (PaddlePaddle#44441) * fix FleetWrapper initialize * [XPU][NPU] (1) add device_guard. (2) add support for LoDTensorArray of sum op. (PaddlePaddle#44367) * device_guard support xpu. test=kunlun * sum op of xpu support LoDTensorArray. add test for while op of xpu. test=kunlun. * [IPU] add Op uts (PaddlePaddle#44415) * transfer block_id to CreateVarNode in multi_devices_graph_pass (PaddlePaddle#44366) * fix CreateVarNode in multi_devices_graph_pass * Revert "Fix var duplication bug for graph_to_program_pass (PaddlePaddle#44278)" This reverts commit a2c4c86. * 【GPUPS】Adam accessor (PaddlePaddle#43919) * add adam/sharedadam optimzier for gpups;edit optimizer struct;test=develop * [Phi] migrate sync_batch_norm to phi (PaddlePaddle#44369) * [GPUPS]Fix psgpuwrapper initialization (PaddlePaddle#44468) * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.cc * [Phi] migrate exponential kernel to phi (PaddlePaddle#44376) * [Phi] migrate exponential kernel to phi * fix comment * fix CI * [PHI] move diag_embed op to phi. (PaddlePaddle#44408) * move diag_embed to phi. * [MLU] set_value performance optimizing (PaddlePaddle#44390) * Update api changing approve members (PaddlePaddle#44463) * update api approve members, test=document_fix * add qingqnig into list, test=document_fix * fix bug,test=document_fix (PaddlePaddle#44478) * [Phi] migrate clip_by_norm to phi (PaddlePaddle#44458) * add eigen3 dependency for phi_backends (PaddlePaddle#44479) * remove fleet_13 ut in parallel_UT_rule.py; test=develop (PaddlePaddle#44477) * [PHI]Seperate xshape kernel from normal kernel (PaddlePaddle#44315) * seperate xshape kernel from normal kernel * fix bugs in infermeta * fix compile bugs * fix compile bugs * [AutoParallel] fix unittest with paddle.distributed.launch (PaddlePaddle#44439) * fix unittest * fix log_dir * _enable_legacy_dygraph * [Phi] add temporal_shift yaml (PaddlePaddle#44409) * add temporal_shift yaml and unittest * [Paddle inference] Add conv_fusion_fp16 (PaddlePaddle#44435) * convfusionfp16 * convfusionfp16 * convfusionfp16 * fix some convert error found in tipc. (PaddlePaddle#44457) * fix some error found in tipc. * update * [BugFix]Fix randint_like bugs when save program that don't need use tensor's value (PaddlePaddle#44446) * fix bugs of random * fix unittest error * fix unittest bugs * add adaptive pool and softmax with cross entropy supports different axis, * test = kunlun (PaddlePaddle#44428) * add xpu pnorm op and fix pool op, *test=kunlun * add adaptive pool, and softmax with cross entropy supports different axis, *test=kunlun * add slot attr for push sparse op (PaddlePaddle#44422) * add slot attr for push sparse op * add pybind * remove fleet * add unittest * fix * [Dy2Sta]Fix Segment Fault while training multi-card if params have no grad (PaddlePaddle#44485) * [Dy2Sta]Fix Segment Fault while training multi-card if params have no grad * fix unittest * fix tensor stream error in custom op (PaddlePaddle#44500) * Replace with dygraph op calling method. (PaddlePaddle#44331) * Replace with dygraph op calling method. * [JitLayer]Pybind PEFunction and call phi api in layer_test (PaddlePaddle#44465) * Support predictor function in JitLayer * Pybind PEFunction * Pybind PEFunction and call phi api in layer_test * Call sqrt phi API * Polish flags * Fix comments * [Sparse] Add sparse addmm kernel (dense+coo*dense->dense,dense+csr*dense->dense) (PaddlePaddle#44451) * [Eager] bilinear_tensor_product yaml (PaddlePaddle#44459) * bilinear_tensor_product yaml * [ Phi ] svd transfer (PaddlePaddle#44392) * svd cpu forward * svd gpu forward * transfer the backward of svd * remove cusolver in svd_grad * svd kernel bug fix * fix bugs * fix bugs. * fix bug * [Paddle-TRT] fix_fill_constant (PaddlePaddle#44481) * fix_fill_constant * fix_fill_constant * fix_ernie * [MLU] transpose avg_pool2d to NHWC for better performance. (PaddlePaddle#44475) * [jit] jit support property.proto (PaddlePaddle#44337) * add property.proto, can compiled * property get and deserilize * support get float * format code * format code * add unittest * add more set method * fix grammar error * Update paddle/fluid/jit/property.h Co-authored-by: Aurelius84 <zhangliujie@baidu.com> * Update paddle/fluid/jit/property.cc Co-authored-by: Aurelius84 <zhangliujie@baidu.com> * Update paddle/fluid/jit/property.cc Co-authored-by: Aurelius84 <zhangliujie@baidu.com> * Update paddle/fluid/jit/property.cc Co-authored-by: Aurelius84 <zhangliujie@baidu.com> * fix comment * fix error throw * fix property save unit test * fix error info * fix copyright and header import * reorder jit property tensor datatype Co-authored-by: Aurelius84 <zhangliujie@baidu.com> * [ Dy2static ] infer_program may be incorrect in amp mode. (PaddlePaddle#44487) * fix the outputs of net is x,x * add unittest for duplicate output * fix * fix _infer_program use the original program not the amp program. * get _***program_id back and avoid duplicate cache ing * fix * Fc fp16 (PaddlePaddle#44505) * fc support fp16 * add a ‘,’ on paddle_pass_builder.cc * fc support fp16 on non-cuda. * add batch stream (PaddlePaddle#44524) * shufflechannelfix (PaddlePaddle#44516) * fix arg_max to select first index (PaddlePaddle#44521) * [MLU] add floor kernel and grid_sampler kernel (PaddlePaddle#44498) * commit (PaddlePaddle#44534) * [CustomDevice] register Copy for custom device (PaddlePaddle#44200) * [CustomDevice] register Copy for custom device * [CustomDevice] register Copy for custom device * [CustomDevice] register Copy for custom device * merge and add uts * merge and add uts * fix for blocking and unittests coverage * (modified) fc support fp16 (PaddlePaddle#44540) * Add code of occupancy computing on DCU and avoid threadID bug for DCU profiler (PaddlePaddle#44520) * add xpu lars_momentum/pow2_decay (PaddlePaddle#44448) *test=kunlun * [phi] move inverse op from fluid to phi (PaddlePaddle#44471) * move inverse from fluid to phi with unitest bug * fix bug, add eager op yaml * support send_partial, recv_partial and allgather_partial in ProcessGroupNCCL (PaddlePaddle#44444) * [Sparse]add sparse unary api(expm1/deg2rad/rad2deg/relu6/leaky_relu) (PaddlePaddle#44432) * Fc fp16 (PaddlePaddle#44558) * (modified) fc support fp16 * __CUDA_ARCH__ version * delete half * delete half * Fix bug of amp code-gen (PaddlePaddle#44570) * fix bug of amp code_gen * fix bug * [JitLayer]Fix jit.save error when save params combined (PaddlePaddle#44504) * Fix jit.save error when save params combined * Change dict_value to list * [Phi] Migrate squared_l2_norm_op to phi (PaddlePaddle#44492) * add swish using TensorRT layer (PaddlePaddle#44561) * update * empty commit * update * update * update * Phi gird sampler migration (PaddlePaddle#44562) * add_ymal_utest for phi grid_sampler op * skip dist test cases if mlu card number only one, test=develop (PaddlePaddle#44549) * [dy2st]Add ProgramHelper to polish build program logic in autoparallel.Engine (PaddlePaddle#44513) * [dy2st]Add ProgramHelper to polish build program logic in autoparallel.Engine * refine code * 【Hackathon No.21】为 Paddle 新增 SoftMarginLoss (PaddlePaddle#42364) * 2022-04-28 * 2022-04-28_V2 * 2022-04-30 * 2022-04-30_V2 * 2022-05-01 * 2022-05-02 * 2022-05-02_V2 * 2022-05-05_V1 * 2022-05-06_V1 * 2022-05-07_V1 * Update loss.py * 2022-05-07_V2 * 2022-05-13_V1 * Update test_soft_margin_loss.py * Update loss.py * Update loss.py * 2022-05-16_V1 * 2022-05-19_V1 * 2022-05-20_V1 * Update test_soft_margin_loss.py * 2022-06-01_V1 * 2022-06-05 * 2022-06-07 * 2022-06-07 * 2022-06-08 * 2022-06-08_V2 * 2022-06-17-code_style * Modify python * 2022-06-20 * for * for CI;test=document_fix Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> * [MLU]transpose convbpf output to HWCN for better performance (PaddlePaddle#44552) * Fc fp16 (PaddlePaddle#44578) * (modified) fc support fp16 * __CUDA_ARCH__ version * delete half * delete half * add half support * add half support * add half support * [Auto Parallel] Add dist op cost (PaddlePaddle#44146) * update comp cost * add dist default op cost * add dist fill constant batch size like op cost * add elewise op cost * add fill_constant_batch_size_like op cost unittest * add unittest and remove fill_constant_batch_size_like grad op cost * add to cmakelist * fix unittest bug * Improve CI unittest parallel execution strategy (PaddlePaddle#44334) * paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=parallel_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test=paralle_test_daily * test pre_test_bak * test cfs * test_cfs,test=paralle_test_daily * test_cfs,test=paralle_test_daily * fix nightly test name,test=paralle_test_daily * fix nightly test name,test=paralle_test_daily * test ci parallel speed * refine parallel rule,test=paralle_test_daily * Move bmm OP from fluid to phi (PaddlePaddle#44496) * [PHI]Move slogdeterminant op to phi (PaddlePaddle#44547) * Move slogdeterminant op to phi * Add yaml and unit test for slogdeterminant * Rename pybind_boost_header.h (PaddlePaddle#44592) * unify data type and property enum value (PaddlePaddle#44585) * inference multi stream support handle lazy init. (PaddlePaddle#44563) * multi stream support handle lazy init. * support eigen lazy init * update * fix ci problem * Remove ControlDepVar in GraphToBlock (PaddlePaddle#44591) * transfer the svd infer into phi infermeta (PaddlePaddle#44528) * transfer the svd infer into phi infermeta * remove the svd.h * modify svd api * fix svd error by insert optional * Einsum grad complex (PaddlePaddle#44598) * add complex for einsum grad kernel * pass the ci * add reverse yaml (PaddlePaddle#44518) * add reverse yaml * Set more attrs in ReplaceScaleLossGradOp (PaddlePaddle#44576) * Set more attrs in ReplaceScaleLossGradOp * Fix typos * Fix CI errors * Add UT * [Phi] Migrate box coder to phi. (PaddlePaddle#44550) * fix behavior of device_id=None in Tensor.cuda (PaddlePaddle#44515) * fix behavior of device_id=None in Tensor.cuda * fix CI * fix windows cuda11.7 bug (PaddlePaddle#44601) * add horizontal federation learning ps feature (PaddlePaddle#44327) * back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fl-ps v1.0 * . * support N + N mode * . * . * . * . * delete print * . * . * . * . * fix bug * . * . * fl-ps with coordinator ready * merge dev * update message parse only * update fl client scheduler * fix bug * update multithreads sync * fix ci errors * update role_maker.py * update role_maker.py * fix ci error: windows py import error * fix ci error: windows py import error * fix windows ci pylib import error * add dump fields & params * try to fix windows import fleet error * fix ps FLAGS error * [MLU] rollback cntoolkit vetsion to 2.8.5 (PaddlePaddle#44595) * [CustomDevice] add blas_axpby api for gradient_accumulator (PaddlePaddle#44584) * add sin,cos,exp primitive operators (PaddlePaddle#44345) * Optimize sparse convolution (PaddlePaddle#43576) * Merge kProgramDescs in GraphToProgram (PaddlePaddle#44526) * [Eager] Add warpctc yaml (PaddlePaddle#44617) * Add a feed op before each input parameter var. (PaddlePaddle#44499) * Add a feed op before each input parameter var. * Fix some issues about the unit test build_cinn_pass_test. * fix record event for operator type in new dygraph (PaddlePaddle#44582) * fix new dygraph record event for op * update unit test * fix bug of elementwise_add_grad, *test=kunlun (PaddlePaddle#44545) * fix bug of elementwise_add_grad, *test=kunlun * fix bug, *test=kunlun * rm pooling_t, *test=kunlun * fix bug of ew_add_grad when inplace, *test=kunlun * [IPU] small bug fix (PaddlePaddle#44473) * sync misc changes * add authors Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * up x * Revert "up x" This reverts commit f3fde45. * add guarg for ipu Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * support auto fallback to cpu kernel for cusom device (PaddlePaddle#44639) * fix dygraph bugs in broadcast_to api. (PaddlePaddle#44612) * add set_dtype for inverse_op (PaddlePaddle#44618) * refine overalls.cmake (PaddlePaddle#44623) * [PHI]Add yaml and unittest for bmm op (PaddlePaddle#44625) Add yaml and unittest for bmm op * Phi average accumulates migration (PaddlePaddle#44554) * move average_accumulates op to phi kernel * new exe not support pg (PaddlePaddle#44628) * [CustomDevice]fix phi kernel header (PaddlePaddle#44637) * [CustomDevice] add process_group_xccl ut (PaddlePaddle#44632) * [CustomDevice] add process_group_xccl ut * update * Fix conv api name (PaddlePaddle#44636) * [DCU] Fix NAN problem when training BERT on DUC platform (PaddlePaddle#44643) * [JitLayer]Remove include fluid head files in JitLayer (PaddlePaddle#44597) * Remove include fluid head files in JitLayer * Format code * Remove const to fix ci error * Fix param error * Polish jit layer include and cp some headers to python/include * Fix comment * [jit] jit.save support property serialization (PaddlePaddle#44581) * jit.save support peropty serilization * extract set property function * fix property test file name * fix typing error * fix typing error * fix test coverage * Replaced add_custom_command with add_custom_target in xpu_kp_cmake (PaddlePaddle#44619) * Replaced add_custom_command with add_custom_target in xpu_kp_cmake * add adagrad and rmsprop yaml (PaddlePaddle#44631) * [phi] move crop_tensor kernel from fluid to phi (PaddlePaddle#44574) * move crop_tensor from fluid to phi * delete fluid header files * fix crop_tensor_op dygraph_mode bug * modify header files, add out tensor check * fix RemoveIntermediateOut in fuse_elewise_add_act_pass while converting graph to program (PaddlePaddle#44593) * fix RemoveNode in fuse_elewise_add_act_pass * fix * change pointer to share_ptr * fix * fix * fix format * fix * fix graph_safe_remove_nodes * fix UTs on physical ipu (PaddlePaddle#44647) * [IPU] add more loss ops (PaddlePaddle#44646) * add more loss ops * add authors Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> * add g_ipuplace_pytype (PaddlePaddle#44648) * Strided slice fp16 (PaddlePaddle#44653) * [MLU]fix sync_batch_norm and concat_grad op (PaddlePaddle#44586) * retain dist op returns (PaddlePaddle#44634) * xpu unittest grad compute supports more types, *test=kunlun (PaddlePaddle#44606) * [Eager] Add hierarchical_sigmoid yaml (PaddlePaddle#44638) * add matrix_nms in python/paddle/vision/ops.py (PaddlePaddle#44357) * [auto parallel] bug fix for op has sub_block attr created with copy_from (PaddlePaddle#44664) * Change the way to set attributes for grad op maker (PaddlePaddle#44514) * fix typos in template for codegen of operators * change the way to set attributes for grad op maker * [XPU] add top_k op (PaddlePaddle#44656) * [XPU] add top_k op. test=kunlun * [XPU] add top_k op. test=kunlun * use PADDLE_ENFORCE_XDNN_NOT_NULL to check pointer. test=kunlun * Support broadcast tensor in phi system (PaddlePaddle#44590) * [PHI] Move spectral_norm to phi (PaddlePaddle#44577) * Add kernel declarations * Copy kernel implementation code * Transfer implementation code * Fix: Move out_grad to first * Register new kernels * Remove old kernels * Move out_grad to last * Fix bugs * Transfer infermeta * Add yaml files * Add blank line * Fix code style * Optimize directory structure Co-authored-by: Bobholamovic <linmanhui@baidu.com> * Complete the dtypes for all_gather, add all_gather_object api (PaddlePaddle#44417) * [Eager] refactor general_grad and fix some bugs (PaddlePaddle#44611) * refactor general_grad and fix some bugs * add TODO: support prune logic deeper * support log_grad op, *test=kunlun (PaddlePaddle#44662) * [LAUNCH] add distributed launch check tools (PaddlePaddle#44495) * add launch test * launch test for cpu * bs 1 * Move api(lgamma) from legacy_api.yaml to api.yaml (PaddlePaddle#44355) * Move api(lgamma) from legacy_api.yaml to api.yaml * Move api(lgamma) from legacy_api.yaml to api.yaml * Move api(lgamma) from legacy_api.yaml to api.yaml * modify code style * add x to X mapping * add definition of lgamma * delete redundant lgamma definitions * Modify code comments * Modify ops.py code format * add lgamma single test and lgamma api in fluid * Optimized lgamma unittest * Move frame kernel to phi (PaddlePaddle#44615) * Move frame OP to phi、add frame OP yaml config and supplement single test * add Header file of in_dygraph_mode * Modify variable name and FrameGradInferMeta multiplex UnchangedInferMeta * move seq2col to phi * delete elementwise pow in xpu_kp_list (PaddlePaddle#44661) * [MLU] fix log_softmax mode selection. (PaddlePaddle#44669) * adapt for resnet (PaddlePaddle#44685) * Fix some problem of kernel fallback in C++ API (PaddlePaddle#44681) * support auto fallback to cpu kernel for cusom device * fix some problem of kernel fallback * fix bugs of lstsq (PaddlePaddle#44689) * migrate dirichlet kernel to phi (PaddlePaddle#44434) * migrate dirichlet op kernel to phi * fix dirichlet sample memory leak * [phi]move softsign from fluid to phi (PaddlePaddle#44616) * test_activation_op unitest error, yaml & activation.py in_dygraph_mode incomplete * fix test_activation_op unitest error, add yaml and dygraph test * fix code style with pre-commit * try to fix namespace error of abs in activation_functor.h * fix namespace error of abs * [Paddle Inference] Support depthwise_conv2d fp16. (PaddlePaddle#44642) * depthwise_fp16 * depthwise_fp16 * depthwise_fp16 * depthwise_fp16 * fix logging debug level (PaddlePaddle#44684) * back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fl-ps v1.0 * . * support N + N mode * . * . * . * . * delete print * . * . * . * . * fix bug * . * . * fl-ps with coordinator ready * merge dev * update message parse only * update fl client scheduler * fix bug * update multithreads sync * fix ci errors * update role_maker.py * update role_maker.py * fix ci error: windows py import error * fix ci error: windows py import error * fix windows ci pylib import error * add dump fields & params * try to fix windows import fleet error * fix ps FLAGS error * fix logging risk * fix logging possible risk * Skip CUDA Graph case for standalone executor (PaddlePaddle#44693) * [Eager] fix lerp grad kernel logic (PaddlePaddle#44705) * clone ort_predictor reuse session (PaddlePaddle#44703) * [XPU] add sampling_id op, add top_k op, update xdnn api. test=kunlun (PaddlePaddle#44704) * fused_fc_elementwise_layernorm_op support fp16 (PaddlePaddle#44710) * fused_fc_elementwise_layernorm support fp16 * fused_fc_elementwise_layernorm support double * [Phi] Add yaml for assign_value (PaddlePaddle#44596) * [Phi] Add yaml for assign_value * [Phi] Fix the bug of the assign api and modify the unittest * [Phi] Fix the bug when the tensor does not have the backend info * [Phi] Replace the functional-style cast init by the brace-init * [Phi] Cast the data explicitly * [PHI] Move lu to phi (PaddlePaddle#44605) * Add kernel declarations * Copy kernel implementation code * Transfer implementation code * Register new kernels * Remove old kernels * Fix code style * Fix bugs * mutable_data->HostAlloc * Transfer infermeta * Add yaml and update python api * Add PADDLE_WITH_HIP check * Update unittests * Fix bugs * Fix bugs * Optimize directory structure * Add output checks * lu_impl.h->lu_kernel_impl.h Co-authored-by: Bobholamovic <linmanhui@baidu.com> * [MLU] add pytest for mlu strided_slice kernel (PaddlePaddle#44523) * Support backward final hook (PaddlePaddle#44686) * update to sdk2.6.0 (PaddlePaddle#44673) * move CUDAStream to phi (PaddlePaddle#44529) * init * move CUDAStream to phi * fix compilation * merge develop * add stream_owned_ member * split cuda_stream.h * fix cpu compile * fix constructor * fix bug * fix windows compile * fix inference test_levit * fix windows tests * [Auto parallel] Optimization Tuning (PaddlePaddle#43782) * fixed bug for pass & engine * fixed bug for benchmark GPT-3 * add tuner & profiler * add algorithms & config * skip cast trt convert when input dtype is bool (PaddlePaddle#44716) * skip cast trt convert when input dtype is bool * [LAUNCH] fix set args bug (PaddlePaddle#44717) * Phi softplus migration (PaddlePaddle#44542) * add yaml and utests of phi softplus add yaml of softplus fix softplus bug in phi * update utests * bug fix * bug fix for test_layers * layer api match * match def and doc in ops.py * doc polish * fix unwanted modified of thresholded_relu * style imporve * 【PaddlePaddle Hackathon 3 No.15】为 Paddle 新增 count_nonzero (PaddlePaddle#44169) * add count_nonzero api * remove grad test * [WIP] Matmul v1 & v2 unification -- part 1 (PaddlePaddle#44640) * - Unit tests to be debugged - fix - refactor - diagnostic - more diagnostic - fix - Fix number two - fix - fix - fix - alpha added - more fixes - compilation fix - removed diagnostic code - cosmetic fixes * lint * add FLAGS_enable_api_kernel_fallback (PaddlePaddle#44706) * add FLAGS_enable_api_kernel_fallback * deal with more cases * add ut for coverage * phi_multiclass_nms3 (PaddlePaddle#44613) * add some fp16 op for kunlun resnet50 model (PaddlePaddle#44672) * add some fp16 op for kunlun resnet50 model *test=kunlun * tmp *test=kunlun * add dist op costs (PaddlePaddle#44701) * [API/OP] Migrate Lstsq op into phi (PaddlePaddle#44318) * migrate lstsq op * update * fix bugs for CIs * update * fix bugs * add uts * update * update * update * fix bugs of jip * fix bugs of hip * update * update according to review * update * update * update * update * Add sparse SyncBatchNorm (PaddlePaddle#43520) * add sparse SyncBatchNorm * unify fluid::CUDADeviceContext and phi::GpuContext (PaddlePaddle#44723) * remove cudaDeviceContext * remove more template * fix rocm compile * 【PaddlePaddle Hackathon 3 No.12】为 Paddle 新增 pairwise_distance (PaddlePaddle#44161) * add paddle.nn.functional.pairwise_distance (cattidea#273) * remove the test case for undefined behavior Co-authored-by: SigureMo <sigure.qaq@gmail.com> * Phi prior box (PaddlePaddle#44431) * phi_prior_box * add float[] support * phi_prior_box_optest * update * ort backend support output mutable data (PaddlePaddle#44724) * [PHI] Move lu_unpack to phi (PaddlePaddle#44674) * Add kernel declarations * Copy kernel implementation code * Transfer implementation code * Register new kernels * Remove old kernels * Fix code style * Fix bugs * mutable_data->HostAlloc * Transfer infermeta * Add yaml and update python api * Add PADDLE_WITH_HIP check * Update unittests * Add kernel declarations * Copy kernel implementation code * Transfer kernel implementation code * Register new kernels * Remove old kernels * Add lu_unpack_sig * Fix bugs * Fix bugs * Fix bugs * Optimize directory structure * Add output checks * Update include files * lu_impl.h->lu_kernel_impl.h * Transfer infermeta * Add yaml and update python api * Add check_eager Co-authored-by: Bobholamovic <linmanhui@baidu.com> * update document of quantile and nanquantile; test=document_fix (PaddlePaddle#42413) * migrate reduce_amin,reduce_amax kernel to phi (PaddlePaddle#44698) * [Paddle Inference] add varlen_token_prune plugin, pass, convert (PaddlePaddle#44733) * add varlen_token_prune plugin, pass, convert * support build with Ninja on Linux (PaddlePaddle#44210) * support ninja * fix mkldnn on windows * fix mkldnn on windows up1 * up2 * up3 * fix gflags * BUILD_BYPRODUCTS_OPTION -> BUILD_BYPRODUCTS_ARGS * use CMAKE_COMMAND * up x * migrate overlap_add and overlap_add_grad op (PaddlePaddle#44739) * update code format * add ymal and test * update for comments * Fix to CI (PaddlePaddle#44744) * - fix * - another fix * lint * infer context fix place error. (PaddlePaddle#44726) * infer context fix place error. * update * update * [operator migration] Migrate unstack_op and nms_op (PaddlePaddle#44424) * update unstack_op * update unstack_op * update unstack_op * fix unstack test * update unstack * update with remote * fix unstack_test.py * temp_save_change_nms_op * add nms test * update nms fix * update unstack_op * temp save change * finish fix nms_op * pass nms test * fix CI * fix ops test * save change * fix code style * fix code style * fix ci and codestyle * fix ci Co-authored-by: ShiningZhang <zhang_liang1991@126.com> * Update linalg.py (PaddlePaddle#44347) * Fix test and doc (PaddlePaddle#44735) * fix test and doc * fix all_gather_object with various length, test=allcases (PaddlePaddle#44718) * update manipulation.py paddle.moveaxis (PaddlePaddle#44191) * [CI] CI for Distributed (PaddlePaddle#44085) * generate_unify_header supports excludes (PaddlePaddle#44761) * [JitLayer]Polish PEFuntion to speed up JitLayer and fix memory leak (PaddlePaddle#44738) * Polish PEFuntion to speed up JitLayer * Polish PEFunction code * Fix comments * paddle2onnx update version to 1.0.0rc2 (PaddlePaddle#44759) * set parallel_job according to CUDA memory in Windows CI unittest (PaddlePaddle#44695) * set parallel_job according to CUDA memory * fix bug: add whitespace between conten and [] or condition wont work * [Sparse] optimize sparse attention (PaddlePaddle#44743) * GPUGraph merge to develop (PaddlePaddle#44594) Co-authored-by: seemingwang <zsasuke@qq.com> Co-authored-by: DesmonDay <908660116@qq.com> Co-authored-by: seemingwang <seemingwang@users.noreply.github.com> Co-authored-by: Thunderbrook <a754913769@163.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: root <root@yq01-sys-hic-k8s-v100-box-a225-0693.yq01.baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> Co-authored-by: yaoxuefeng <yaoxuefeng@baidu.com> Co-authored-by: lxsbupt <luoxsbupt@163.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: qingshui <qshuihu@gmail.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Revert for cmake static library errors on XPU KP PaddlePaddle#44762 * unify gpu context (PaddlePaddle#44740) * remove cudaDeviceContext * remove more template * fix rocm compile * remove alias name CUDADeviceContext * fix compile * fix tests * revert changes * API doc(en) Bugs fix in 第四期体验评估 (PaddlePaddle#44749) * fix docs(en) bugs;test=document_fix * update paddle.add docs;test=document_fix * update paddle.where docs;test=document_fix * for ci;test=document_fix * Update manipulation.py * update paddle.where;test=document_fix Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> * Modify the output result annotation under the lerp function (PaddlePaddle#44035) * Refactor build_op_downstream_map for standalone executor (PaddlePaddle#44729) * Refactor build_op_downstream_map for standalone executor * Add some comments * update xpu.cmake to 20220731, test=kunlun (PaddlePaddle#44767) * fix ut new_group_api (PaddlePaddle#44764) * support beam_search operator on xpu. test=kunlun (PaddlePaddle#44720) * support beam_search operator on xpu. test=kunlun * support beam_search operator on xpu. test=kunlun * support beam_search operator on xpu. test=kunlun * support beam_search operator on xpu. test=kunlun * support beam_search operator on xpu. test=kunlun * [phi] add yolov3_loss yaml and unittest (PaddlePaddle#44476) * add yaml and unittest * update yaml * update backward yaml and unittest * update yaml * add Yolov3LossGradInferMeta * update yolov3_loss_op.cc * fix bug * code format * Update manipulation.py for rot90() (PaddlePaddle#44038) * fix compile error;test=develop * fix compile error;test=develop * fix compile;test=develop Co-authored-by: Sing_chan <51314274+betterpig@users.noreply.github.com> Co-authored-by: zlsh80826 <rewang@nvidia.com> Co-authored-by: Ruibiao Chen <chenruibiao@baidu.com> Co-authored-by: RichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com> Co-authored-by: taixiurong <taixiurong@126.com> Co-authored-by: Allen Guo <alleng@graphcore.ai> Co-authored-by: Zhixin Yao <zhixiny@graphcore.ai> Co-authored-by: Zhaorui Chen <zhaoruic@graphcore.ai> Co-authored-by: zhangxiaoci <zhangxiaoci@baidu.com> Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: zhangkaihuo <zhangkaihuo@baidu.com> Co-authored-by: wanghuancoder <wanghuan29@baidu.com> Co-authored-by: xiongkun <xiongkun03@baidu.com> Co-authored-by: Aurelius84 <zhangliujie@baidu.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: Weilong Wu <veyron_wu@163.com> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> Co-authored-by: ronnywang <ronny1996@163.com> Co-authored-by: zhoutianzi666 <39978853+zhoutianzi666@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: Wilber <jiweibo@baidu.com> Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: QingshuChen <chenqingshu@baidu.com> Co-authored-by: levi131 <83750468+levi131@users.noreply.github.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: 王明冬 <78149749+winter-wang@users.noreply.github.com> Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: Chenxiao Niu <ncxinhanzhong@gmail.com> Co-authored-by: Zhou Wei <1183042833@qq.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com> Co-authored-by: zhangyikun02 <48021248+zhangyk0314@users.noreply.github.com> Co-authored-by: huzhiqiang <912790387@qq.com> Co-authored-by: jakpiase <jakpia21@gmail.com> Co-authored-by: Piotr Paturej <piotr.paturej@intel.com> Co-authored-by: zhaocaibei123 <48509226+zhaocaibei123@users.noreply.github.com> Co-authored-by: freeliuzc <lzc842650834@gmail.com> Co-authored-by: tianshuo78520a <707759223@qq.com> Co-authored-by: zmxdream <zhangminxu01@baidu.com> Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com> Co-authored-by: pangyoki <pangyoki@126.com> Co-authored-by: lyq <30404405+affectionlu@users.noreply.github.com> Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> Co-authored-by: fuyou765 <64373205+fuyou765@users.noreply.github.com> Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: ccrrong <101700995+ccrrong@users.noreply.github.com> Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com> Co-authored-by: ykkk2333 <77383312+ykkk2333@users.noreply.github.com> Co-authored-by: Li Min <11663212+limin2021@users.noreply.github.com> Co-authored-by: Hui Zhang <zhtclz@foxmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: cifar10 <41565156+cifar10@users.noreply.github.com> Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com> Co-authored-by: Aganlengzi <aganlengzi@gmail.com> Co-authored-by: yuguo <948529990@qq.com> Co-authored-by: Zhang Jun <ewalker@live.cn> Co-authored-by: Wang Bojun <105858416+wwbitejotunn@users.noreply.github.com> Co-authored-by: yangguohao <70266361+yangguohao@users.noreply.github.com> Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: Lux et Veritas <1004239791@qq.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com> Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com> Co-authored-by: Zhen Wang <wangzhen31@baidu.com> Co-authored-by: chenjian <chenjian26@baidu.com> Co-authored-by: helen88 <z8hanghuan@126.com> Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: qipengh <huangqipeng@cambricon.com> Co-authored-by: shangliang Xu <ghostxsl@users.noreply.github.com> Co-authored-by: Jiabin Yang <360788950@qq.com> Co-authored-by: Lin Manhui <mhlin425@whu.edu.cn> Co-authored-by: Bobholamovic <linmanhui@baidu.com> Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com> Co-authored-by: kuizhiqing <kuizhiqing@baidu.com> Co-authored-by: Charles-hit <56987902+Charles-hit@users.noreply.github.com> Co-authored-by: HongyuJia <jiahongyu@baidu.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Yulong Ao <aoyulong@baidu.com> Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: thunder95 <290844930@qq.com> Co-authored-by: Jacek Czaja <jacek.czaja@intel.com> Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: Ainavo <57820731+Ainavo@users.noreply.github.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: Asthestarsfalll <72954905+Asthestarsfalll@users.noreply.github.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: Thomas Young <35565423+HexToString@users.noreply.github.com> Co-authored-by: ShiningZhang <zhang_liang1991@126.com> Co-authored-by: OccupyMars2025 <31559413+OccupyMars2025@users.noreply.github.com> Co-authored-by: mrcangye <mrcangye@email.cn> Co-authored-by: Roc <30228238+sljlp@users.noreply.github.com> Co-authored-by: seemingwang <zsasuke@qq.com> Co-authored-by: DesmonDay <908660116@qq.com> Co-authored-by: seemingwang <seemingwang@users.noreply.github.com> Co-authored-by: Thunderbrook <a754913769@163.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: root <root@yq01-sys-hic-k8s-v100-box-a225-0693.yq01.baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> Co-authored-by: yaoxuefeng <yaoxuefeng@baidu.com> Co-authored-by: lxsbupt <luoxsbupt@163.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: qingshui <qshuihu@gmail.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: yang131313 <lisy928472889@163.com> Co-authored-by: mengqingchun02 <103740521+mengqingchun02@users.noreply.github.com> Co-authored-by: 熊峻峰 <xiongjunfeng@sina.com>

fix new dygraph record event for op

ad4f44b

update unit test

c352a37

zhiqiu approved these changes Jul 26, 2022

View reviewed changes

From00 merged commit 963163e into PaddlePaddle:develop Jul 26, 2022

rainyfly added a commit to rainyfly/Paddle that referenced this pull request Jul 27, 2022

fix record event for operator type in new dygraph (PaddlePaddle#44582)

91b762f

* fix new dygraph record event for op * update unit test

XiaoguangHu01 pushed a commit that referenced this pull request Aug 2, 2022

Fix operator type record in profiler [cherry-pick PR44582] (#44654)

6de2058

* fix record event for operator type in new dygraph (#44582) * fix new dygraph record event for op * update unit test * fix file mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix record event for operator type in new dygraph #44582

fix record event for operator type in new dygraph #44582

rainyfly commented Jul 25, 2022 •

edited

Loading

paddle-bot bot commented Jul 25, 2022

zhiqiu left a comment

fix record event for operator type in new dygraph #44582

fix record event for operator type in new dygraph #44582

Conversation

rainyfly commented Jul 25, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Jul 25, 2022

zhiqiu left a comment

Choose a reason for hiding this comment

rainyfly commented Jul 25, 2022 •

edited

Loading