Merge Baidu Changes into github #48

reyoung · 2016-09-08T11:01:51Z

Will fix current gradient machine unittest error

Change-Id: Ia60c008a8c922f66e6b5e2ca3e488fc4625d6506

* the last layer in the stack already has all the suffixes

Change-Id: I636cf13af5becb1168bc9749266b55580c46f6c9

Change-Id: I0b8608feab8f6be5094e8981fc5f65cb401ed415

The reference return type causes ThreadParameterUpdater.cpp:123 seg fault under gcc5.4. Change-Id: I7a1c155892722076a7cb48793b83d5ee525747d1

Change-Id: Ibc39ca1d1a27d0d28569e29f41a5647659f8c764

DataProviderWrapperConverter with DataProviderConverter.

Change-Id: Ia84860cdc25945ececba84fc9807495c9e5f047b

Change-Id: I028e402c964ca4f4431cbf8153bea4379dd4df70

* Start over * Added basic docs Added .travis.yml Added scripts to build documentation on the Travis. * Disable several deploy script commend for testing. * fixed the deploy_docs.sh script. * Update travis.yml * Renamed docs to doc * update .gitignore. * Delete .gitignore * Update .travis.yml * Update .travis.yml * Update deploy_docs.sh * Update .travis.yml * Develop doc (PaddlePaddle#1) * Add paddle submodule * Update the build script. * Update script * Use gen_doc_lib instead. * Move files around * cache external * Update submodule * try to cache batch 1 * add test code * Update Paddle submodule * Update submodule * update script to print more * update python path * test * test * test * clean up the code * Update Script (PaddlePaddle#2) * add new file * Develop doc (PaddlePaddle#3) * Add the rest of the submodules into the system * Provide first symlink fit a line * Update the rest of book to symlinks * add models submodule * Add link for models * Update deploy_docs * update to use lite 2

fix build error test=develop

refine text.py

1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm.

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30) * Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (PaddlePaddle#32) * clean code * Add numpy-based implementation of spectral ops (PaddlePaddle#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (PaddlePaddle#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (PaddlePaddle#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (PaddlePaddle#37) * Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (PaddlePaddle#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42) * Add api doc and update unittest. (PaddlePaddle#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (PaddlePaddle#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (PaddlePaddle#45) * use dynload for cufft (PaddlePaddle#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (PaddlePaddle#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (PaddlePaddle#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (PaddlePaddle#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (PaddlePaddle#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (PaddlePaddle#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (PaddlePaddle#55) explicitly specify capture mode for lambdas * fix fft sample (PaddlePaddle#53) * fix fft sample * update scipy and numpy version for unittests of fft (PaddlePaddle#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57) * Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

…nc-ir fea/make lowered func ir

* add ernie_training.py * add reference

alter pe print

update paddle-trt demo with 2.0 api.

* pull sparse-ptr asyn * fix curand bug Co-authored-by: liaoxiaochao <liaoxiaochao@baidu.com>

model server 3thrd

optimize datafeed by CPU async

[MTAI-484] feat(build): fix code style for cpp lint

Added NVIDIA Pascal support

add group_pattern_util.InferShardableAxesFromSink

Haonan and others added 10 commits September 8, 2016 18:42

Argument concat for subsequence start positions

1e1a33b

Change-Id: Ia60c008a8c922f66e6b5e2ca3e488fc4625d6506

fix the layer name error when stacking RecurrentGroups

6a873f5

* the last layer in the stack already has all the suffixes

revert CRFLayer, remove wrong gpu support

fbfd24e

Change-Id: I636cf13af5becb1168bc9749266b55580c46f6c9

Update Jumbo package to 0.8.0b0.

721b09e

Change-Id: I0b8608feab8f6be5094e8981fc5f65cb401ed415

Fix ThreadParameterUpdater

7ad55a4

The reference return type causes ThreadParameterUpdater.cpp:123 seg fault under gcc5.4. Change-Id: I7a1c155892722076a7cb48793b83d5ee525747d1

Fix 32-bit gcc compile warnings.

fdd40e5

Change-Id: Ibc39ca1d1a27d0d28569e29f41a5647659f8c764

bug fix for hl_matrix_classification_error

903d5c7

Refine doc of Python Prediction API, replace

5547a76

DataProviderWrapperConverter with DataProviderConverter.

fix docker tag mistake

d6d9122

Change-Id: Ia84860cdc25945ececba84fc9807495c9e5f047b

fix unitest of test_RecurrentGradientMachine, and some tiny doc update

dbaabc9

Change-Id: I028e402c964ca4f4431cbf8153bea4379dd4df70

reyoung assigned luotao1, qingqing01, emailweixu and hedaoyuan Sep 8, 2016

reyoung merged commit 3304de7 into PaddlePaddle:master Sep 8, 2016

F0REacH mentioned this pull request Sep 8, 2016

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

Closed

jiweibo pushed a commit to jiweibo/Paddle that referenced this pull request Jan 6, 2020

Merge pull request PaddlePaddle#48 from jiweibo/lite_engine

c39ca7b

fix build error test=develop

qingqing01 pushed a commit to qingqing01/Paddle that referenced this pull request Apr 30, 2020

Merge pull request PaddlePaddle#48 from xyzhou-puck/master

e0f5c55

refine text.py

ForFishes pushed a commit to ForFishes/Paddle that referenced this pull request Oct 27, 2020

fix dnn dump error (PaddlePaddle#48)

fd717a3

DemoMoon mentioned this pull request Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

thisjiang added a commit to thisjiang/Paddle that referenced this pull request Sep 16, 2021

add REGISTER_PIANO_OP_WITHOUT_MAKER (PaddlePaddle#48)

a8bc997

lijiaqi0612 pushed a commit to lijiaqi0612/Paddle that referenced this pull request Sep 16, 2021

use dynload for cufft (PaddlePaddle#48)

cc9d3a0

1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm.

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

Merge pull request PaddlePaddle#48 from Superjomn/fea/make-lowered_fu…

18a20ce

…nc-ir fea/make lowered func ir

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this pull request Dec 8, 2021

add ernie_training.py (PaddlePaddle#48)

6a7ac4c

* add ernie_training.py * add reference

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

Merge pull request PaddlePaddle#48 from FrostML/pe-print

054c95e

alter pe print

paddle-bot-old bot referenced this pull request Apr 26, 2022

Update __init__.py

6a112a5

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022

Merge pull request PaddlePaddle#48 from jiweibo/2.0_api_for_trt

a0e45d3

update paddle-trt demo with 2.0 api.

Thunderbrook pushed a commit to Thunderbrook/Paddle that referenced this pull request Aug 12, 2022

Lxch curand bug fix (PaddlePaddle#48)

184c631

* pull sparse-ptr asyn * fix curand bug Co-authored-by: liaoxiaochao <liaoxiaochao@baidu.com>

zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Nov 2, 2022

Lxch curand bug fix (PaddlePaddle#48)

9a7b0aa

* pull sparse-ptr asyn * fix curand bug Co-authored-by: liaoxiaochao <liaoxiaochao@baidu.com>

jack603047588 referenced this pull request in jack603047588/Paddle Nov 9, 2022

Merge pull request #48 from dongwenxin2046/paddlebox

6466203

model server 3thrd

lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024

fix readme links (PaddlePaddle#48)

78c8e4e

zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Feb 27, 2024

Merge pull request PaddlePaddle#48 from YaoCheng8667/paddlebox-yc

be6d0dd

optimize datafeed by CPU async

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this pull request Feb 29, 2024

Merge pull request PaddlePaddle#48 from mthreads/cpp_lint

9f1c64b

[MTAI-484] feat(build): fix code style for cpp lint

NKNaN pushed a commit to NKNaN/Paddle that referenced this pull request Mar 3, 2024

Merge pull request PaddlePaddle#48 from iassael/compute-capability

14858fe

Added NVIDIA Pascal support

feifei-111 pushed a commit to feifei-111/Paddle that referenced this pull request Mar 10, 2024

Merge pull request PaddlePaddle#48 from tc20042008/xk-cinn-trivalop-fuse

53ba3ed

add group_pattern_util.InferShardableAxesFromSink

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Baidu Changes into github #48

Merge Baidu Changes into github #48

reyoung commented Sep 8, 2016

Merge Baidu Changes into github #48

Merge Baidu Changes into github #48

Conversation

reyoung commented Sep 8, 2016