Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge develop #3

Merged
merged 136 commits into from
Apr 25, 2021
Merged

merge develop #3

merged 136 commits into from
Apr 25, 2021

Conversation

b3602sss
Copy link
Owner

PR types

PR changes

Describe

Ray2020BD and others added 30 commits April 12, 2021 17:12
* [Rocm] fix python test of multinomial

* [Rocm] fix python test of multinomial

* [Rocm] fix python test of multinomial

* [Rocm] fix python test of multinomial
* skip paddle.Tensor.<lambda>

* some file may not exists. such as version.py, it's generated by setup.py

* debug mode

* add unittests for sampcd_processor.py

* add test cases for sampcd_processor

* add test cases for sampcd_processor

* add testcases

* add test cases

* add testcases

* add testcases

* refactor, add testcases

* add import

* all files map to pool. dont split manually

* __all__ += another list

* add testcases

* add testcases

* handle个锤子啊

* this line should not removed

wadefelix@882e7f7#diff-cb0679475bf60202fd803ae05b9146989437c3f787d1502616be6c71c69d0fb1

* print -> logger

* regulate the logging infomation

* regulate the logging infomation

* logger to file

* logger

* threads or subprocesses number config

* follow the good code style

don't touch wlist.json

* run test_sampcd_processor.py, it's a unittest for sampcd_processor.py

* update unittest for sampcd_processor.py

test=document_fix
* add layer.to api

* add layer.to api

* add layer.to api

* add the doc for Layer.to

* add input type checking

* modify assert and import bug

* format code style

* format code style

* make place support str type

* add SetGradVarBase method to set the gradient after conversion

* modify argument palce to device

* modify argument palce to device

* modify doc of layers.to API

* add xpuplace to device argument
* fix error for long args

* remove unneccessary code
* extend multiclass_nms unittest timeout threshold

* adjust timeout to 200s

* temporarily disable multiclass_nms trt op teller
* add common dtypes as paddle's dtypes

* import paddle.fluid.core_avx.VarDesc.VarType as paddle.dtype
* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm
* Delete grpc.cmake/distribeted/distributed_ops

* reset operators/CMakeLists.txt

* rm test_transpiler_ops.py

* del test_transpiler_ops.py
* [ROCM] fix some typo in cmake, test=develop

* [ROCM] fix rccl in paddle build script, test=develop
* add register backward hook method

* add leaf grad accumullated test
* add check for runtime dynamic shape

* add unittest

* add lower bound case

* adjust timeout of new ut to 120s
* Initial draft for SGD BG16 kernel.

* Unit tests for SGD with BF16 data type.

* Add VLOG message to SGD BF16 op CPU kernel.

* Enhance error messages and error types.

* Refactor SGD op kernels to leverage some common code.

* Make easier to add new kerne invoke code.

* Fix SGD op kernel for sparse grad.

* Unify quotes style.

* Fix error for ROCM compilation.

* Use specialized PADDLE_ENFORCE_xx functions.
* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)

Co-authored-by: root <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def

Co-authored-by: oyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly

Co-authored-by: oyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)

Co-authored-by: frankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all

Co-authored-by: frankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation

Co-authored-by: oyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac69.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2

Co-authored-by: oyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment

Co-authored-by: oyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel

Co-authored-by: zhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code

Co-authored-by: xiayanming <41795079@qq.com>
Co-authored-by: Leo Chen <chenqiuliang@baidu.com>
Co-authored-by: liym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Reventon_L <luyuxiang1994@qq.com>
Co-authored-by: root <xiayanming@baidu.com>
Co-authored-by: oyjxer <1728722986@qq.com>
Co-authored-by: yinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: OleNet <olenet@126.com>
Co-authored-by: Meiyim <chen_xuyi@outlook.com>
Co-authored-by: oyxuan-11 <963650125@qq.com>
Co-authored-by: pangyoki <pangyoki@126.com>
fix test sync_with_cpp (#32212)
* custom python backward

* polish up the code

* polish up the code

* polish up the code.

* Fix code format and comments.

* Delete redundant files.

* add unnittest.

* edit unnittest.

* edit unnittest.

* Remove redundant header files.

* Improve coverage and remove redundant code.

* support saving for backward.

* polish code according to comments.

* Add support type for PyLayer.

* Modify the DOC.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish code and make the code robust.

* Modify the code format.
* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error
denglin-github and others added 23 commits April 25, 2021 10:10
* Add dlnne engine runtime

* Fix log

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Fix CMakeList format error

* Add copyright message

* Fix dlnne CMakeList.txt

* Add some paddlepaddle_pass to support more networks

* Fix some format bug
* use ZerosLike instead of NPUMemsetAsync

* fix compile
…32428)

* let paddle.utils.install_check support CPU package with GPU device

* use use_cuda in dygraph checking

* add unittest for install_check
* fix tc trt shape

* fix fc dynamic shape

* add fc shape assert

* update
…2470)

* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result

* support the cuda for fix the compare broadcast bug
* add Hub Module for easy to use pre-trained models.
*   support list, load, help fucntions.
*   support load models by github, gitee, local 

Co-authored-by: LielinJiang <jianglielin@baidu.com>
* support save/load binary format tensor

* Fix error when create cudaplace

* Fix error when create cudaplace

* Fix error when create cudaplace

* get devive context from pool.

* move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.

* support complex object

* improve coverage.

* improve coverage

* improve coverage.

* fix a bug.

* polish API

* save/load program

* paddle.save/load: layer

* deal with conflict

* if PY2, block test_paddle_save_load.TestSaveLoadLayer

* polish code.

* polish code

* edit unnittest

* The condition for object to be identified as state_dict becomes strict

* use 'core._cuda_synchronize'
* add trt runtime version check

* use different wrap, and change to major version check
@b3602sss b3602sss merged commit 99f4d09 into b3602sss:develop Apr 25, 2021
b3602sss pushed a commit that referenced this pull request Dec 8, 2021
* update fft api path (PaddlePaddle#36219)

* update fft api path
* add sample code for ihfft2

Co-authored-by: chenfeiyu <chenfeiyu@baidu.com>

* fix fft axis (PaddlePaddle#36321)

fix: `-1` is used when fft's axis is `0`

* use unified external error message for cufft api (PaddlePaddle#36114)

* fft: modify sample code result (PaddlePaddle#36325)

* dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414)

* add rocm support for fft api (PaddlePaddle#36415)

* move signal apis

* move fft and signal API path (#2)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos in signal.py (#3)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* disable Cache when CUFFT_VERSION >= 10200 (#4)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* Add LRUCache for fft plans

* add LRUCache for cuff and hipfft (#5)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* WIP: add cache

* delete move constructor and operator= for CuFFTHandle and FFTConfig

* remove log from CuFFTHandle and FFTConfig

* add lrucache for fft rocm backend

* disable LRUCache when CUFFT_VERSION >= 10200

* disbale copy and move for hipFFTHandle; format code

Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>

* remove debug message of cufftHandler

* roll_op: support Tensor as input for shifts (PaddlePaddle#36727)

* fix fftshift/ifftshift on static mode

* update roll_op version

* add more test cases for fftshift/ifftshift

Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com>
Co-authored-by: chenfeiyu <chenfeiyu@baidu.com>
Co-authored-by: LJQ❤️ <33169170+lijiaqi0612@users.noreply.github.com>
b3602sss pushed a commit that referenced this pull request Jan 5, 2022
…ten::DenseTensor, test=allcases (PaddlePaddle#38473)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes
b3602sss pushed a commit that referenced this pull request Jan 12, 2022
…t=allcases (PaddlePaddle#38632)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues
b3602sss pushed a commit that referenced this pull request Jan 12, 2022
…od_tensor,test=allcases (PaddlePaddle#38811)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

* Removed interfaces & members from lod_tensor,test=allcases
b3602sss pushed a commit that referenced this pull request Jan 27, 2022
PaddlePaddle#39128)

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again
b3602sss pushed a commit that referenced this pull request Apr 8, 2022
…Paddle#41051)

* [Refactor] refactored eager_gen.py PR #2

* [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes

* Fixed minor issue

* Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition

* Fixed issues

* Supported higher-order grad node generation

* [DoubleGrad PR #4] Supported higher-order GradNode generation

* Fixed yaml typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.