merge paddle develop #2

b3602sss · 2021-04-12T09:07:04Z

PR types

PR changes

Describe

* add attrs deformable_groups

* [ROCM] update fluid operators for rocm (part3), test=develop * fix clang format error, test=develop

…31361) * [ROCM] update fluid elementwise op for rocm (part10), test=develop * update, test=develop * address review comments, test=develop

Fix Read-Only Attribute as while_loop Output: Usually, our convert_while_loop will be like: ``` [a, b, c] = paddle.jit.dy2static.convert_while_loop( condition_name, body_name, [a, b, c]) ``` where a, b, c are in loop_var_names. However, if loop_var_names contains property such as foo.x, we cannot assign the attribute as output of convert_while_loop because Python property is a kind of read-only attribute. To handle the case, we replace the attributes which are output of convert_while_loop with generated variables, then if we know the attribute is not read-only at runtime, we assign the attribute. The created statements are like: ``` [a, b, __attribute_variable_1] = paddle.jit.dy2static.convert_while_loop( condition_name, body_name, [a, b, foo.x]) if not isinstance(getattr(type(foo), x, None), property): foo.x = __attribute_variable_1 ```

Fix wrong code comment

* improve performance of depthwise_conv2d * add unittest

* fix modified_retry_method_only_win * fix bug * fix retry bug on windows

…31149) prepare remove grad op and kernel script. update Paddle_CI_Inference pipeline.

* fix python full coverage decrease issue * fix

…spawn support for multi xpu and some bug-fixes (#31130)

…31191)

…ter argument (#31391) * auto specify PADDLE_WITH_MKLDNN and remove Interpretper * remove print * fix check abi * fix windows * fix compile flags

#29088)

* fix undefind var in For * fix code style

…pports from ascendrc (#32144) * [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: frankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: xiayanming <41795079@qq.com> Co-authored-by: gongweibao <weibao.gong@gmail.com> Co-authored-by: frankwhzhang <frankwhzhang@126.com> Co-authored-by: liym27 <33742067+liym27@users.noreply.github.com>

… is float16 (#31887) * make high precision for avg_pool

…1 and dim <= 1024 (#31630)

* fix concat_grad on kunlun * fix concat_grad on kunlun

* [ROCM] fix test_gru_rnn_op * [ROCM] fix test_expand_op * [ROCM] fix test_cross_entropy_loss * [ROCM] fix test_conv_nn_grad * [ROCM] fix test_bilinear_tensor_product_op * [ROCM] fix elementwise_op_function * [ROCM] fix test_lstm_cudnn_op * [ROCM] fix test_gpu_package_without_gpu_device * [ROCM] fix test_gru_unit_op * [ROCM] fix test_imperative_optimizer * [ROCM] fix rnn * [ROCM] fix group_norm_op * [ROCM] fix test_pool3d_api * [ROCM] fix test_pool3d_op

* test,test,notest,test=windows_ci * test,notest,test=windows_ci * test,notest,test=windows_ci * test,notest,test=windows_ci * remove test code * delete some unnecessary logs * fix format error * turn on added ut check on windows

* initial commit: simple demo * polish copyright format * add grap op simple demo * adapt uncertain number of argument * change trait marco name * add place & dtype support for add kernel * add dispath and infershape func * poish code & add notes * add dynamic_loader dep for paddle_framework * add new custom op test dir * polish impl details * add unittest for new custom op * fix failed unittest * Costum op (#1) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * refactor register design & add test * change op_funtion to op_meta_info * split op meta info into .h and .cc * move get methods into friend class * move OpMetaInfoHelper into framework space * move CustomTensorUtils into framework space * change pybind api name * move PD C API into op meta info * add register custom op api * remove inference cmake change * refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * polish detail & error message * polish test details * Add cast api && Change copy related api to copy_to && add more test (#4) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add type cast * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * merge cwh code * merge cwh code * merge cwh code * merge cwh code * merge cwh code * add more error log * add more error log * polish code * used for test * remove test comment * remove test comment * fix uint8 type error * fix lost uint8 type error * add test for coverage * polish details by reviewer comments * add prefix for DISABLE_COPY_AND_ASSIGN Co-authored-by: Jiabin Yang <360788950@qq.com>

merge paddle develop

* update fft api path (PaddlePaddle#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> * fix fft axis (PaddlePaddle#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (PaddlePaddle#36114) * fft: modify sample code result (PaddlePaddle#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414) * add rocm support for fft api (PaddlePaddle#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (PaddlePaddle#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ❤️ <33169170+lijiaqi0612@users.noreply.github.com>

…addlePaddle#38275) * Replaced pten::LoD with paddle::framework::LoD * Overrided CPUVector with CUDAVector * Refactored paddle::framework::Vector

…addlePaddle#39087) * Renamed selected_rows.* -> selected_rows_utils.* * Added selected_rows and rw_lock to pten * Removed useless header * Renamed the unit test target to fix CI * Use pten::framework::DDim * Set selceted_rows_test properties timeout * Polish code to pten style Co-authored-by: Chen Weihang <chenweihang@baidu.com>

…rdFunctions and GradNodes (PaddlePaddle#40937) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue

…enerateForwardDefinition (PaddlePaddle#41016) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Fixed minor issue

…Paddle#41051) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * Fixed yaml typo

…e#41121) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * Fixed minor issue

…sed to paddle.grad() (PaddlePaddle#41198) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues

…efore backward run (PaddlePaddle#41306) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR PaddlePaddle#7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues

…atmul (PaddlePaddle#41387) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR PaddlePaddle#7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues * [DoubleGrad PR PaddlePaddle#8] Enabled triple grads for sigmoid and matmul * Fixed issues with phi kernel * Added triple grad test case * Fixed minor issue

wangna11BD and others added 30 commits March 3, 2021 07:50

Add attrs deformable_groups for deformable_conv API (#31335)

1cbccfa

* add attrs deformable_groups

[ROCM] update fluid operators for rocm (part6), test=develop (#31301)

946dbda

[Custom OP]polish doc of custom OP (#31369)

13e4280

compile with VS2017, test=develop (#31388)

c1bc223

fix bert cu file compiler error, test=develop (#31389)

6626c6a

[ROCM] update fluid operators for rocm (part9), test=develop (#31338)

e312a1f

TRT conv2d converter support SAME padding (#31379)

32211fe

[ROCM] fix softmax with loss and update python scripts, test=develop (#…

db50fb6

…31373)

[ROCM] update fluid operators for rocm (part7), test=develop (#31307)

3b9db17

[ROCM] update fluid operators for rocm (part3), test=develop (#31213)

84639b6

* [ROCM] update fluid operators for rocm (part3), test=develop * fix clang format error, test=develop

[ROCM] update fluid elementwise op for rocm (part10), test=develop (#…

7cdf6ea

…31361) * [ROCM] update fluid elementwise op for rocm (part10), test=develop * update, test=develop * address review comments, test=develop

Added LSTM BF16 and fixed GRU BF16 (#31234)

5b4f8aa

Fix comment (#31424)

c40b98e

Fix wrong code comment

Fix bug for set_value op when input dtype is not float32 (#31411)

0fff930

Windows system supports Ninja compilation (#31161)

4d6d2db

improve performance of depthwise_conv2d (#31099)

dcce54e

* improve performance of depthwise_conv2d * add unittest

fix modified_retry_method_only_win (#31404)

3a8ef10

* fix modified_retry_method_only_win * fix bug * fix retry bug on windows

support float16 for temporal_shift op (#31432)

7d95e59

prepare remove grad script and update PADDLE_CI_INFERENCE pipeline (#…

c9a7bfe

…31149) prepare remove grad op and kernel script. update Paddle_CI_Inference pipeline.

fix python full coverage decrease issue (#31429)

62289fc

* fix python full coverage decrease issue * fix

[Dy2Stat] Remove gast.Index for compatibility of gast 0.4.0 (#31358)

522c91e

[ROCM] update fluid platform for rocm (part5), test=develop (#31315)

4d647ec

[Kunlun]Multi xpu dygraph performance optimization , add distributed.…

9ebf05b

…spawn support for multi xpu and some bug-fixes (#31130)

add more info in trt engine serialization (#31434)

1321c47

fix trt serialization on windows (#31438)

30717a6

Creating a CUDA function to find the minimum value in warp or block (#…

8491ae9

…31191)

upgrade inference tensor apis, test=develop (#31402)

bc7632b

Fix cmake of cryptopp to avoid downloading every time (#31447)

ffdd5b7

[CustomOp] Automatically specify PADDLE_WITH_MKLDNN & Remove Interpre…

fadabbe

…ter argument (#31391) * auto specify PADDLE_WITH_MKLDNN and remove Interpretper * remove print * fix check abi * fix windows * fix compile flags

Xreki and others added 14 commits April 9, 2021 13:52

Advoid CPU -> CPU memory copy when start, end, step is already on CPU. (

95122eb

#29088)

[Dy2Stat] Fix undefined var used in For (#32153)

4636d13

* fix undefind var in For * fix code style

fix unittest timeour (#32161)

a73cb67

make high precision for avg_pool and adaptive_avg_pool when data_type…

ec2ffb6

… is float16 (#31887) * make high precision for avg_pool

Ci py3 gcc5.4 (#32045)

afa3720

Optimize the performance of the forward of log_softmax when axis is -…

f8bab5b

…1 and dim <= 1024 (#31630)

fix concat_grad on kunlun (#32151)

a2387ef

* fix concat_grad on kunlun * fix concat_grad on kunlun

remove PYTHON_ABI, test=document_fix (#32190)

80698ca

follow comments to refine PR 32144 (#32174)

af374ae

Optimization of bilinear backward OP CUDA kernel. (#30950)

d8afe40

[CustomOp]Fix description of supporting MacOS (#32192)

bb3b790

b3602sss merged commit deb9ae0 into b3602sss:develop Apr 12, 2021

b3602sss added a commit that referenced this pull request Apr 14, 2021

Merge pull request #2 from PaddlePaddle/develop

879a359

merge paddle develop

b3602sss pushed a commit that referenced this pull request Nov 30, 2021

Added fluid dependencies to Eager Dygraph #2 (PaddlePaddle#37556)

471fa1e

b3602sss pushed a commit that referenced this pull request Nov 30, 2021

Added Eager Dygraph AutoCodeGen dependencies #2 (PaddlePaddle#37575)

e7bda1d

b3602sss pushed a commit that referenced this pull request Mar 30, 2022

[Refactor] refactored eager_gen.py PR #2 (PaddlePaddle#40907)

f027b2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge paddle develop #2

merge paddle develop #2

b3602sss commented Apr 12, 2021

merge paddle develop #2

merge paddle develop #2

Conversation

b3602sss commented Apr 12, 2021

PR types

PR changes

Describe