update Paddle to newest version #1

thisjiang · 2020-12-18T03:39:19Z

PR types

PR changes

Describe

* Add more value to calculate hash key * fix size_t * polish code

…CI (#29499) * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop * fix if error with CI_SKIP_TEST, test=develop * fix add properties to test error on Linux/MAC, test=develop * fix set test properties of test_code_generator error, test=develop * remove test codes and advance judgment of file modification on Linux, test=develop * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix * Add branch judgement on Linux, test=develop

* improve drop out * add VectorizedRandomGeneratorWithGenerator * fix bug * modify according to comments

* Sharding add hybrid-dp feature * update sharding in distributed_strategy * update sharding unitest * revise code format for sharding

* Fix a bug when running on an operating system without "bash." * add execution condition * for ci-coverage * get cpu information to check the precision problem * Update compilation environment for musl version * update dependencies * remove test code check cpu info remove test code review * update alpine and third_party denpendencies * add newline for ci Code format

Fix 3 Windows Unittests test_fuse_all_reduce_pass: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag test_feed_data_check_shape_type: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag test_tsm: Winodws GPU size is not enough so decrease batch size and data size.

…s negative in for-range stmts (#29519) 1. Fix error in _build_cond_stmt of for-range stmts. 2. Support that step value is negative in for-range stmts 3. Fix code because of the diff between Py2 and Py3

* support roi_align & affine_channel for kunlun * minor

* fix compile problem when cuda_arch < 6000 * refine code * refine code

* add service, remove ut on mac * fix heter_profiler & add heter stop method * fix code style

* fix unittst unstable issue on ci machine * fix unittst unstable issue on ci machine * fix unittst unstable issue on ci machine

* disable test_parallel_executor_profiler in cuda 10.1 * update set_tests_properties

只检查增的情况，不检查删除情况

* fix bug of matmul_v2 for broadcast

* fix the dowanload bug * add sort for ips

* add alias for fluid.contrib.mixed_precision

* improve grad perf

* add static.amp into setup.pu.in * add unittest for api

* 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期

…29337)

* add constant pad double grad

* fix expand && concat/transpose to new api * update xpu_header * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * add nearest_interp on kunlun * update error message

* Add approval monitor for unity_build_rule.cmake, test=develop * fix words spell error, test=document_fix

…ommon template code (#29715)

* update EarlyStopping doc * update EarlyStopping doc, test=document_fix

* add complex real op & api & unittest * add imag op & api & unittest * refactor op impl * revert simplify writing due to complile failed * polish details * polish grad op code

#29617) Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.

* Windows generate pdb and dump, for debug * fix code style, test=develop * modify cmakelist

* enable_use_gpu has higher priority than FLAGS * update.

* add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number

* update to_tensor en docs

…op (#29720)

Unit test reported memory not enough on CPU machines. Reduce batch size again.

* initial commit: simple demo * polish copyright format * add grap op simple demo * adapt uncertain number of argument * change trait marco name * add place & dtype support for add kernel * add dispath and infershape func * poish code & add notes * add dynamic_loader dep for paddle_framework * add new custom op test dir * polish impl details * add unittest for new custom op * fix failed unittest * Costum op (#1) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * refactor register design & add test * change op_funtion to op_meta_info * split op meta info into .h and .cc * move get methods into friend class * move OpMetaInfoHelper into framework space * move CustomTensorUtils into framework space * change pybind api name * move PD C API into op meta info * add register custom op api * remove inference cmake change * refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * polish detail & error message * polish test details * Add cast api && Change copy related api to copy_to && add more test (PaddlePaddle#4) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add type cast * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * merge cwh code * merge cwh code * merge cwh code * merge cwh code * merge cwh code * add more error log * add more error log * polish code * used for test * remove test comment * remove test comment * fix uint8 type error * fix lost uint8 type error * add test for coverage * polish details by reviewer comments * add prefix for DISABLE_COPY_AND_ASSIGN Co-authored-by: Jiabin Yang <360788950@qq.com>

add IrNodeTy ostream support

…y::Allocation> for Storage (PaddlePaddle#38301) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage

* Enabled Eager OpTest #1 * Enabled Eager OpTest #1 * Fixed get_tensor method for EagerTensor

* #1 migrate dist-related type()-> dtype() * move datatype function from pten -> fluid/framework * change type() in imperative into convert(dtype()) * modify xx_tensor->type into xx_tensor->dtype * change the set_type interface and the caller * modify xx_tensor.type into xx_tensor.dtype * fix mutable_data(place, dtype()) * change caller of mutable_data in pten and distributed * change the caller of mutable_data in fluid/framework * change the caller of mutable_data in imperative directory * mutable_data: inference * update the call of mutable_data * transfer MakePenScalarArray MakePtenScalar ResetHolderWithType * pass the compile. the next step is remove VarType in Pten * fix all and remove VarType from pten. success in linux. Next task is other platform * fix conflict with develop * fix compiled error * Fix reset conversion * fix conflict * fix compiled problem * fix typo * Fix << in tensor_utils.cc * fix type->dtype * fix unittest * fix tensor init constructor * fix DataTypeSize for BFloat16 * fix code style * fix npu compiled error * fix npu * compile npu sucessfully * fix conflict * fix conflict Co-authored-by: xiongkun <xiongkun03@baidu.com>

Aurelius84 and others added 30 commits December 11, 2020 15:38

Polish hash function of executor cache key (#29556)

2a42250

* Add more value to calculate hash key * fix size_t * polish code

fix train eval set error in static mode (#29540)

c1a26e2

add cast cuda kernel (#29352)

30d9589

improve dropout (#29465)

6702040

* improve drop out * add VectorizedRandomGeneratorWithGenerator * fix bug * modify according to comments

remove duplicated macro (#29563)

1e72e03

[Sharding] add hybrid-dp feature (#29518)

d33d468

* Sharding add hybrid-dp feature * update sharding in distributed_strategy * update sharding unitest * revise code format for sharding

update for xpu ci. (#29568)

740c0d5

[oneDNN] Making ThreadID info in caching key optional (#29272)

f6cca62

[Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step value i…

0cad115

…s negative in for-range stmts (#29519) 1. Fix error in _build_cond_stmt of for-range stmts. 2. Support that step value is negative in for-range stmts 3. Fix code because of the diff between Py2 and Py3

support roi_align & affine_channel for kunlun (#29561)

79a41a9

* support roi_align & affine_channel for kunlun * minor

Fix compile problem when cuda_arch < 6000 (#29576)

c016383

* fix compile problem when cuda_arch < 6000 * refine code * refine code

add service (#29560)

0034273

* add service, remove ut on mac * fix heter_profiler & add heter stop method * fix code style

fix unittst unstable issue on ci machine (#29588)

d72604c

* fix unittst unstable issue on ci machine * fix unittst unstable issue on ci machine * fix unittst unstable issue on ci machine

gen nccl id use socket (#29431)

467c716

add some feature for paddle.flops (#29572)

ee1a7d0

add float16 into adaptive_avg_pool2d check list. (#29547)

2cb6f94

update, test=develop (#29559)

ff6a145

Added verbose oneDNN lib version (#29378)

62d4483

elementwise_add_grad Op optimization (#29575)

ac4bae8

disable test_parallel_executor_profiler in cuda 10.1 (#29581)

81acc32

* disable test_parallel_executor_profiler in cuda 10.1 * update set_tests_properties

Add clip double grad (#29590)

8d549fc

Simplify the prompt of const_cast check. (#29548)

a908208

只检查增的情况，不检查删除情况

Fix bug of matmul_v2 for broadcast case (#29599)

1efef8b

* fix bug of matmul_v2 for broadcast

Fix the dowanload bug in the case of multiple machines (#29551)

fb6697b

* fix the dowanload bug * add sort for ips

add alias for fluid.contrib.mixed_precision (#29562)

c05170d

* add alias for fluid.contrib.mixed_precision

fix cache pip error (#29618)

18f9df0

fix none-contiguous bug for python api. (#29615)

78dad78

zhangting2020 and others added 21 commits December 16, 2020 16:50

improve dropout grad (#29605)

1e9127f

* improve grad perf

add static.amp into setup.pu.in (#29621)

b96dada

* add static.amp into setup.pu.in * add unittest for api

添加rocm平台支持代码 (#29342)

7673850

* 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期

[Kunlun] PR1:Support one Kunlun card training in parallel executor (#…

f13c3a9

…29337)

add pad and concat double grad (#29549)

cc38715

* add constant pad double grad

[GO] add two cgo api, test=develop (#29659)

7684b91

Add approval monitor for unity_build_rule.cmake (#29701)

bb5a785

* Add approval monitor for unity_build_rule.cmake, test=develop * fix words spell error, test=document_fix

delete the code for fp16 optimization because it is not faster than c…

2e0d1ed

…ommon template code (#29715)

Update EarlyStopping sample code (#29723)

572810e

* update EarlyStopping doc * update EarlyStopping doc, test=document_fix

Added missing format of oneDNN (#29670)

9eff1a6

[Complex] Add real & imag op and api for complex tensor (#29672)

6cfa59d

* add complex real op & api & unittest * add imag op & api & unittest * refactor op impl * revert simplify writing due to complile failed * polish details * polish grad op code

Windows generate pdb and dump, for debug (#29628)

0c59ad2

* Windows generate pdb and dump, for debug * fix code style, test=develop * modify cmakelist

fix ubuntu docker error (#29719)

638ccaa

fleet sync build strategy, test=develop (#29732)

9cbcc6c

[Inference] EnableUseGpu has higher priority than flags (#29697)

b593d58

* enable_use_gpu has higher priority than FLAGS * update.

Update en docs of to_tensor (#29718)

10edfb6

* update to_tensor en docs

update the operator registration for incompatible upgrade, test=devel…

8bd2879

…op (#29720)

Reduce batch size ot fix CPU memory, test=develop (#29736)

2e788bd

Unit test reported memory not enough on CPU machines. Reduce batch size again.

thisjiang merged commit ff9053a into thisjiang:develop Dec 18, 2020

thisjiang pushed a commit that referenced this pull request Oct 28, 2021

Merge pull request #1 from Superjomn/fea/add-ir-description

a97a458

add IrNodeTy ostream support

thisjiang pushed a commit that referenced this pull request Dec 1, 2021

Added Eager Dygraph AutoCodeGen dependencies #1 (PaddlePaddle#37574)

fcd44b5

thisjiang pushed a commit that referenced this pull request Dec 1, 2021

Added performance tests for Eager Dygraph #1 (PaddlePaddle#37638)

7df301f

thisjiang pushed a commit that referenced this pull request Mar 10, 2022

infershaped autogen (PR #1), test=develop (PaddlePaddle#39405)

b3e049f

thisjiang pushed a commit that referenced this pull request Mar 10, 2022

Fixed get_tensor method for EagerTensor (PaddlePaddle#39414)

9722994

* Enabled Eager OpTest #1 * Enabled Eager OpTest #1 * Fixed get_tensor method for EagerTensor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update Paddle to newest version #1

update Paddle to newest version #1

thisjiang commented Dec 18, 2020

update Paddle to newest version #1

update Paddle to newest version #1

Conversation

thisjiang commented Dec 18, 2020

PR types

PR changes

Describe