Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #15

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
205 commits
Select commit Hold shift + click to select a range
9f6e5fd
fix path error on windows when precision switch is turn on (#33025)
XieYunshen May 25, 2021
88dfb30
fix hogwild_worker init_place bug (#33078)
danleifeng May 25, 2021
dc72ffa
add the IsLeftDefault definition for pass enhance,test=develop (#33081)
winter-wang May 25, 2021
ac3603b
add async save for sparse table (#33072)
seiriosPlus May 25, 2021
09bc0f5
[Other] SparseShardingMerge Tool (#32887)
seiriosPlus May 25, 2021
c294cca
禁止在低版本TRT中使用strides>1的conv (#32997)
b3602sss May 25, 2021
3a7b9ed
add the op def proto, test=develop (#33098)
Shixiaowei02 May 25, 2021
dbc08d6
modify complex template for elementwise ops (#33071)
MingMingShangTian May 25, 2021
1bb73c6
fix utest (#33108)
ForFishes May 25, 2021
f91e0f4
Add Automatic SParsity Utilities (#32995)
mingxu1067 May 25, 2021
accf284
Fix ninja compilation bug and warning on windows (#32987)
zhwesky2010 May 26, 2021
14e8d19
fix cmake error on PR-CI-Coverage, test=develop (#33121)
Avin0323 May 26, 2021
009ff61
fix model_benchmark ci (#33093)
tianshuo78520a May 26, 2021
a2a45d8
Added cast op oneDNN kernel for bf16/fp32 datatypes casting(FWD/BWD) …
jakpiase May 26, 2021
20b9be6
[Tensor Parallelism] split fix bug (#33015)
JZ-LIANG May 26, 2021
c711e91
Add double grad op for sigmoid activation, test=develop (#32971)
jim19930609 May 26, 2021
5c79dbb
Marker op for profiling (#33034)
FeixLiu May 26, 2021
78ecb66
optimize OP's compilation time (#32617)
Avin0323 May 26, 2021
8259d9b
[NPU] refine NpuOpRunner (#32869)
zhiqiu May 26, 2021
6c07cd7
modify matmul Op to complex template types (#33130)
MingMingShangTian May 26, 2021
865f0c1
[NPU] fix compile issue caused by dev changes (#33137)
zhiqiu May 26, 2021
e05a7a4
ut fix (#33102)
seiriosPlus May 26, 2021
b425215
Unify all external API error message mechanism and enhance third-part…
zhwesky2010 May 27, 2021
988b5fe
[PsCore] support ssd (#33031)
Thunderbrook May 27, 2021
9b203ef
Add the time of post_training_quantization unit test (#33146)
juncaipeng May 27, 2021
8c6bbb4
[oneDNN] Accesses to oneDNN cache optimized for conv2d (#33048)
jczaja May 27, 2021
6a5b7e5
[ROCM] add is_compiled_with_rocm api, test=develop (#33043)
qili93 May 27, 2021
6c399d9
Modify Ops from complex64/128 to complex<float/double> types. (#33133)
MingMingShangTian May 27, 2021
481ee79
speed up paddle.add paddle.nn.Linear (#32125)
wanghuancoder May 27, 2021
5756d3e
modify to complex template types in reduce_sum OP and rewrite it's Id…
MingMingShangTian May 28, 2021
2d3cbb4
Add lgamma_op kernel and unittest (#32913)
levi131 May 28, 2021
5363dad
[CustomOP]Set GLIBCXX_USE_CXX11_ABI=1 to fix potential GCC ABI probl…
Aurelius84 May 28, 2021
cf08bab
Put *.so and proto in the build directory into a tar package (#32993)
tianshuo78520a May 28, 2021
1187c61
modify to complex template types for fill_constant op (#33179)
MingMingShangTian May 28, 2021
5b910f9
fix ninja compile bug of warpctc and mkldnn (#33155)
zhwesky2010 May 28, 2021
e90f300
强化非trt conv判断 (#33150)
b3602sss May 28, 2021
02202d0
add `uint8` when check_dtype in assign (#33157)
yghstill May 29, 2021
cf9a4bd
fix compilation error if WITH_DISTRIBUTE=ON, test=develop (#33192)
Avin0323 May 31, 2021
2a771c0
support params groups, test=develop (#32830)
jerrywgz May 31, 2021
9066b57
update get_pr_ut.py (#33037)
lelelelelez May 31, 2021
e587853
[CustomOp]Specify -std=c++14 cflags by default (#33213)
Aurelius84 May 31, 2021
5c6153a
fix bug;test=document_fix (#33221)
lelelelelez May 31, 2021
4540456
Add the op def for conv2d, hard_swish, leaky_relu, relu and swish (#3…
juncaipeng May 31, 2021
387f227
[NPU] refine npu data_device_transform (#33224)
zhiqiu May 31, 2021
0a9937d
improve group norm cpu precision and performance (#33176)
jeff41404 May 31, 2021
f61e6ee
Fix cuda kernel launch of grid sampler (#33100)
wanghaoshuang May 31, 2021
c4dbeca
enhance error message for conv (#33119)
jerrywgz May 31, 2021
dfce571
add files need exec all cases (#33226)
lelelelelez May 31, 2021
06c63ca
replace and remove complex64/128 types in custom OP and other files (…
MingMingShangTian Jun 1, 2021
519cc7b
split conv2d_op unittest (#33231)
jerrywgz Jun 1, 2021
b626791
[cmake] download_verify (#33217)
Wangzheee Jun 1, 2021
0192b82
Align download_filename with cached_filename (#33214)
lyuwenyu Jun 1, 2021
4878f0e
remove ut from parallel_ut_rule (#33143)
XieYunshen Jun 1, 2021
b751a80
fix benchmark time count use hapi (#33225)
LielinJiang Jun 1, 2021
17c6d39
Fix syncbn (#32989)
ceci3 Jun 1, 2021
e939236
Fix duplicate download when incremental compilation (#33230)
zhwesky2010 Jun 1, 2021
44dd918
remove complex64 file (#33237)
MingMingShangTian Jun 1, 2021
cbe45ab
Fix spawn default nprocs get error (#33215)
chenwhql Jun 1, 2021
a986929
add trt convert op: reshape (#33188)
Wangzheee Jun 1, 2021
0f78ddb
Fix path error on windows (#33122)
XieYunshen Jun 1, 2021
e8d6ff5
fix reuse_so_cache (#33234)
tianshuo78520a Jun 1, 2021
0f15496
Reimplement the comparision binary ops using the new optimized CUDA f…
JamesLim-sy Jun 2, 2021
5981bee
conv2d support bfloat16 (#32221)
Avin0323 Jun 2, 2021
e754120
[ROCM] update paddle inference cmake, test=develop (#33260)
qili93 Jun 2, 2021
d1e89ea
optimize OP's compilation time implemented by Eigen, test=develop (#3…
Avin0323 Jun 2, 2021
1b10ccd
fix iScan C++ problems, test=develop (#33274)
Avin0323 Jun 2, 2021
47774d9
remove complex128.h file (#33247)
MingMingShangTian Jun 2, 2021
9d4722c
fix masked_select infer shape (#33167)
ZzSean Jun 2, 2021
09eb82c
fix (#33264)
Wangzheee Jun 2, 2021
635306d
fix test_fused_elemwise_activation_op time out error (#33271)
DannyIsFunny Jun 2, 2021
29dc439
fix jetson arch when compiling with single arch (#33269)
cryoco Jun 2, 2021
fdbdef0
fix conv2d_transpose trt bugs (#33242)
cryoco Jun 2, 2021
d5cc7bf
update mp (#33194)
youth123 Jun 2, 2021
44054ba
fix compilation error on Ampere GPU, test=develop (#33285)
Avin0323 Jun 2, 2021
ae93d9c
change '/' method from scale Op to elementwise_div Op (#33279)
MingMingShangTian Jun 2, 2021
3f366fe
[ROCM] fix fused_fc_elementwise_layernorm, test=develop (#33281)
qili93 Jun 2, 2021
b30a7e3
Modify the judgment method for parallel ut (#33273)
XieYunshen Jun 2, 2021
9c52ade
[slice getitem] Support getitem idx is Tensor or List (#33000)
liym27 Jun 2, 2021
b432d02
Support Add Sub Mul Max Min Pow binary functors in elementwise system…
JamesLim-sy Jun 2, 2021
3bbf2d7
linear use matmul bug not matmul_v2 (#33286)
wanghuancoder Jun 3, 2021
23b9ed3
add an assertion to ensure that the size of each dim of the parameter…
Jun 3, 2021
200d57c
[getitem] Support index is None for getitem in static mode (#33001)
liym27 Jun 3, 2021
fc5b3a9
add the fc fuse example for pass enhance, test=develop (#33250)
winter-wang Jun 3, 2021
4d805e6
multi pricison for lars op and lars optimizer (#33280)
FeixLiu Jun 3, 2021
273f385
add cross stack profiler to profile super ernie (#33112)
lw921014 Jun 3, 2021
c70f1ca
Add progressbar for datasets downloading (#33302)
LielinJiang Jun 3, 2021
8752c91
Dygraph Recompute: support amp (#33251)
JZ-LIANG Jun 3, 2021
941308c
Reimplement logical functors with the new optimized elementwise funct…
JamesLim-sy Jun 4, 2021
2c9ea3d
cpu and gpu separation (#33326)
tianshuo78520a Jun 4, 2021
7528b1e
add seq_conv pbtxt (#33283)
tink2123 Jun 4, 2021
53d3f5e
add sample code for summary (#33337)
LielinJiang Jun 4, 2021
d523dff
[NPU] avoid tensor copy in check_finite_and_scale (#33244)
zhiqiu Jun 4, 2021
1e9299a
Fix hang of hybrid parallel in new_group (#33141)
ForFishes Jun 4, 2021
82630f3
[Dy2stat] Add Support for paddle.grad (#33110)
zhhsplendid Jun 4, 2021
34aebbc
add precision unitest for executor all reduce (#33339)
FeixLiu Jun 4, 2021
57bdf32
add some pbtxts, test=develop (#33342)
Shixiaowei02 Jun 4, 2021
6877b13
make paddle.to_tensor() copy if data is varbase (#33335)
zhiqiu Jun 4, 2021
dd18123
fix inference prepare data bug (#33305)
b3602sss Jun 4, 2021
d194bd3
[Paddle-TRT] Add gather_nd and reduce_sum trt op. (#33324)
jiweibo Jun 5, 2021
1315e3a
Revert "optimize softmax with cross entropy hard label (#32290)" (#33…
Xreki Jun 5, 2021
d46b408
fix undefined_all_variable (#32611)
lelelelelez Jun 7, 2021
2c10ca6
Add op def for quant ops (#33351)
juncaipeng Jun 7, 2021
902c6f9
[HybridParallel]Fix c_split op for TensorParallel (#33207)
ForFishes Jun 7, 2021
a01e513
Add the op def for elementwise_div elementwise_pow etc (#33288)
dyning Jun 7, 2021
d19bceb
pack the @op_name@.pbtxt into library. test=develop (#33322)
winter-wang Jun 7, 2021
7101af3
Add the op def for batch_norm, conv2d_transpose (#33360)
Wangzheee Jun 7, 2021
4da15e6
Fixed a bug of log_softmax: op input was modified to 'nan' (#32937)
AshburnLee Jun 7, 2021
cb12282
[sharding] bugfix for group init hang (#33327)
JZ-LIANG Jun 7, 2021
599e9e4
fix too-many-format-args (#33353)
lelelelelez Jun 7, 2021
443cf71
fix undefined-variable (#33355)
lelelelelez Jun 7, 2021
c5c3732
add op_def for gru, lstm and layer_norm (#33317)
jiweibo Jun 7, 2021
59b8912
[NPU] add private api for memcpy_op (#33258)
zhiqiu Jun 7, 2021
73f2ffa
OP:strided_slice_op supports bool type inputs (#33373)
TeslaZhao Jun 7, 2021
205bcc1
add proto txt info for affine_channel op (#33376)
DannyIsFunny Jun 7, 2021
fb80e95
polish Windows CI (#33392)
zhwesky2010 Jun 7, 2021
94e8360
bump up to oneDNN v2.3 (#33229)
lidanqing-intel Jun 7, 2021
6466653
fix code style (#33395)
TCChenlong Jun 8, 2021
43f6c70
Add 'self' parameters to function Cluster::update_pods, use variable …
Jiangxinz Jun 8, 2021
366d346
fix API: normalize_program (#33384)
T8T9 Jun 8, 2021
260f92d
fix the bug in repeated_fc_relu_fuse_pass.test=develop (#33386)
winter-wang Jun 8, 2021
45d1ae2
add dynamic layer_norm plugin (#33293)
shangzhizhou Jun 8, 2021
37385f6
replace 'InnerSetOverridedStopGradient' with 'SetOverridedStopGradien…
hbwx24 Jun 8, 2021
27f4ced
add flatten2 pbtxt (#33287)
MissPenguin Jun 8, 2021
0820eea
Op pass pool (#33316)
LDOUBLEV Jun 8, 2021
a4dd4c4
add transpose transpose opdef (#33352)
LDOUBLEV Jun 8, 2021
6550c20
fix too-many-function-args-1 (#33398)
lelelelelez Jun 8, 2021
e69c14f
fix no-self-argument (#33356)
lelelelelez Jun 8, 2021
7cadd95
add squeeze2 and unsqueeze2 pbtxt (#33343)
cuicheng01 Jun 8, 2021
b135544
fix dp (#33297)
youth123 Jun 8, 2021
64914ea
update xpu cmake for kunlun (#33328)
tangzhiyi11 Jun 8, 2021
4670a0a
fix test_fc_op (#33417)
zhiqiu Jun 8, 2021
c078957
Add comments to ColorJitter parameters;test=document_fix (#33301)
WenmuZhou Jun 8, 2021
93446be
[Dy2Stat]move data to CUDAPlace in advance (#33345)
Aurelius84 Jun 8, 2021
2af2354
Optimizing prec process on windows (#33256)
XieYunshen Jun 8, 2021
92081e1
fix undefined variable in optimizer (#33416)
zhiqiu Jun 9, 2021
ddc95a0
[quant] Add quant wrap for functional api and refine the qat (#33162)
juncaipeng Jun 9, 2021
b154470
add two attributes for yolo box (#33400)
wangxinxin08 Jun 9, 2021
529245b
add capi tar lib in linux (#33412)
OliverLPH Jun 9, 2021
e1aa4de
add win_capi_tar in paddle_build.bat (#33414)
OliverLPH Jun 9, 2021
a6b3328
fix output_padding in conv (#33428)
jerrywgz Jun 9, 2021
626c1ed
fix the bug of yolo_box which can't run on nano and tx2 (#33422)
fengxiaoshuai Jun 9, 2021
b4954ce
cache core.globals() to speed up dynamic graph (#32098)
wanghuancoder Jun 9, 2021
98f0817
add op pbtxt of matmul/matmul_v2/scale/softmax (#33424)
cryoco Jun 9, 2021
cda893f
[Dy2Stat]Modify into core.ops.run_program (#33246)
Aurelius84 Jun 9, 2021
cdd6437
paddle.save support object save to memory. (#32999)
hbwx24 Jun 9, 2021
5200791
[HybridParallel] Add ParallelCrossEntropy for TensorParallel (#33401)
ForFishes Jun 9, 2021
291fc0f
add random state generate in DataLoader worker (#33310)
heavengate Jun 9, 2021
e08fdd1
Add option "verbose" for predict api (#33405)
LielinJiang Jun 9, 2021
741811e
add bool type for tril api (#33402)
vslyu Jun 9, 2021
a039fd7
[Static getitem] Support static Variable getitem for Ellipsis index (…
liym27 Jun 9, 2021
4cf0146
Polish code for slice and set_value op (#32947)
liym27 Jun 9, 2021
05b9ea5
Use separate uniquename in op_function_impl.h, test=develop (#33189)
wanghuancoder Jun 9, 2021
2329092
Check the installed openblas version in cmake (#33440)
zhiqiu Jun 9, 2021
555c346
[Dy2Stat] fix unittest failed (#33438)
Aurelius84 Jun 9, 2021
32ef95d
Add diagflat op, test=develop (#33334)
limin2021 Jun 9, 2021
1382cd2
[oneDNN] First fix to #33021 (#33174)
jczaja Jun 9, 2021
992d0d9
[Dy2Stat & Quantization]Support append customize attributes into op_d…
Aurelius84 Jun 9, 2021
f249a5f
bugfix: param init with fill constant str_value (#33381)
JZ-LIANG Jun 9, 2021
2b56b1b
fix the bug in the creation of pp groups to avoid hang (#32890)
Jun 9, 2021
9cda9ec
Add API paddle.neg() and paddle.lgamma(), along with some unittests f…
levi131 Jun 9, 2021
42c1297
[HybridParallel] update collective split to use c_embedding and mp_al…
wangxicoding Jun 9, 2021
a526b3e
fuse L2Decay and momentum when param.regularizer is set (#32845)
zhangting2020 Jun 10, 2021
dffc331
make the compatiable pass only check op has pbtxt, test=develop (#33397)
winter-wang Jun 10, 2021
e19736d
fix aligned in roi_align (#33444)
jerrywgz Jun 10, 2021
dec63f1
Support diff dataset tensor place in single process dataloader (#33470)
chenwhql Jun 10, 2021
11b5776
[Dy2stat] Change Some Fluid API to 2.0 API (#33460)
zhhsplendid Jun 10, 2021
a225636
[static getitem]Support index is list bool for getitem in static mode…
liym27 Jun 10, 2021
8061442
Automatic SParsity Helper (#33132)
mingxu1067 Jun 10, 2021
6c11034
fix cifar label dimension. test=develop (#33475)
heavengate Jun 10, 2021
60c9f97
Get exact value of dim in advance for slice op (#33300)
liym27 Jun 10, 2021
df4a978
[Debug] Add nan& inf check FLAG for dygraph (#32635)
chenwhql Jun 10, 2021
6ad1880
fix geo ut (#33441)
seiriosPlus Jun 10, 2021
1410d72
bug fix, test=develop (#33476)
Jun 10, 2021
003b461
dp c_allreduce_sum_fusion op (#33169)
Baibaifan Jun 10, 2021
945e084
enhance compatiable condition for fc fuse pass. test=develop (#33452)
winter-wang Jun 10, 2021
f89a7b5
add wget option in download (#33379)
lyuwenyu Jun 10, 2021
afa4bf5
fix the bug that `print_signature.py` cannot get all the public apis …
wadefelix Jun 10, 2021
ab41a9e
fix unittest failure due to the path is too long (#33447)
zhwesky2010 Jun 10, 2021
b2afc8d
Fix some Bugs of Undefined Variable (#33488)
Jiangxinz Jun 11, 2021
3a213d9
fix image batch bug (#33498)
kuizhiqing Jun 11, 2021
71f8707
fix undefined-variable-1 (#33425)
lelelelelez Jun 11, 2021
aa50868
[Dy2stat] Add Support for a, b = static_variable Grammar (#33499)
zhhsplendid Jun 11, 2021
ebe24e6
polish unitest test_multiprocess_reader_exception (#33504)
chenwhql Jun 11, 2021
6760d73
miss if (#33513)
b3602sss Jun 11, 2021
2de737e
update 2.0 public api in vision (#33308)
zhiboniu Jun 11, 2021
022198c
update 2.0 public api in all left files (#33313)
zhiboniu Jun 11, 2021
08e8147
use PYTHON_C_API in dygraph (#32524)
wanghuancoder Jun 11, 2021
681778d
Update spawn doc for xpu (#33497)
chenwhql Jun 11, 2021
3c49f08
[oneDNN] Second fix to #33021 (#33471)
jczaja Jun 11, 2021
5cca9e4
add expm1_op (#33066)
ronny1996 Jun 11, 2021
9d8d531
fc_elementwise_layer_fuse_pass (#33467)
Jun 11, 2021
abc17ef
Fix gather infer shape using axis (#33413)
ForFishes Jun 11, 2021
cd95ea8
Small fixes related to BF16 fusion_gru and fusion_lstm (#33295)
wozna Jun 11, 2021
fcd93b3
Support Div and FloorDiv functor in elementwise system (#33053)
JamesLim-sy Jun 12, 2021
24bde98
[Paddle-TRT] add support for trt dynamic shape flatten op (#33394)
cryoco Jun 12, 2021
fe94db6
Fix LayerNorm Problem (#33420)
zhiboniu Jun 12, 2021
308467c
Add warning for dataloader incompatable upgrade (#32967)
heavengate Jun 14, 2021
18e71bd
Revert "Fix some Bugs of Undefined Variable (#33488)" (#33538)
Jiangxinz Jun 15, 2021
02a6d49
Add digamma_op and unittest (#33278)
zyfncg Jun 15, 2021
606939d
Support reduce_sum_op float16 (#32966)
thisjiang Jun 15, 2021
1f8de08
add the support for the bool in compare ops
wawltor Jun 15, 2021
c5a6ae4
1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape…
shangzhizhou Jun 15, 2021
3a2230d
add conv3d prototxt (#33501)
bjjwwang Jun 15, 2021
009a163
fix the op attrs error in conv2d pbtxt,test=develop (#33532)
winter-wang Jun 15, 2021
28521e0
Save all the information of 'ParamBase' in 'Layer'. (#33500)
hbwx24 Jun 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
54 changes: 28 additions & 26 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})

include(system)

# Note(zhouwei): Ninja Generator will set CMAKE_BUILD_TYPE to Debug
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release" CACHE STRING
"Choose the type of build, options are: Debug Release RelWithDebInfo MinSizeRel"
FORCE)
endif()

project(paddle CXX C)

# enable language CUDA
Expand Down Expand Up @@ -213,12 +220,6 @@ if(NOT PY_VERSION)
endif()
set(PYBIND11_PYTHON_VERSION ${PY_VERSION})

# CMAKE_BUILD_TYPE
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release" CACHE STRING
"Choose the type of build, options are: Debug Release RelWithDebInfo MinSizeRel"
FORCE)
endif()

# the type of sanitizer, options are: Address, Leak, Memory, Thread, Undefined. Default: OFF
if(SANITIZER_TYPE AND NOT "${SANITIZER_TYPE}" MATCHES "^(Address|Leak|Memory|Thread|Undefined)$")
Expand Down Expand Up @@ -283,6 +284,27 @@ if(WITH_GPU)
endif()
endif()

if(WITH_ROCM)
include(hip)
include(miopen) # set miopen libraries, must before configure
endif(WITH_ROCM)

if (NOT WITH_ROCM AND WITH_RCCL)
MESSAGE(WARNING
"Disable RCCL when compiling without ROCM. Force WITH_RCCL=OFF.")
set(WITH_RCCL OFF CACHE STRING
"Disable RCCL when compiling without ROCM" FORCE)
endif()

if(WITH_RCCL)
add_definitions("-DPADDLE_WITH_RCCL")
include(rccl)
else()
if(WITH_ROCM)
MESSAGE(WARNING "If the environment is multi-card, the WITH_RCCL option needs to be turned on, otherwise only a single card can be used.")
endif()
endif()

include(third_party) # download, build, install third_party, Contains about 20+ dependencies

include(flags) # set paddle compile flags
Expand All @@ -307,26 +329,6 @@ include(configure) # add paddle env configuration

include_directories("${PADDLE_SOURCE_DIR}")

if(WITH_ROCM)
include(hip)
endif(WITH_ROCM)

if (NOT WITH_ROCM AND WITH_RCCL)
MESSAGE(WARNING
"Disable RCCL when compiling without ROCM. Force WITH_RCCL=OFF.")
set(WITH_RCCL OFF CACHE STRING
"Disable RCCL when compiling without ROCM" FORCE)
endif()

if(WITH_RCCL)
add_definitions("-DPADDLE_WITH_RCCL")
include(rccl)
else()
if(WITH_ROCM)
MESSAGE(WARNING "If the environment is multi-card, the WITH_RCCL option needs to be turned on, otherwise only a single card can be used.")
endif()
endif()

if(WITH_NV_JETSON)
set(WITH_ARM ON CACHE STRING "Set WITH_ARM=ON when compiling WITH_NV_JETSON=ON." FORCE)
endif()
Expand Down
24 changes: 15 additions & 9 deletions cmake/cblas.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,21 @@ if(NOT DEFINED CBLAS_PROVIDER)
PATHS ${OPENBLAS_LIB_SEARCH_PATHS})

if(OPENBLAS_LAPACKE_INC_DIR AND OPENBLAS_INC_DIR AND OPENBLAS_LIB)
set(CBLAS_PROVIDER OPENBLAS)
set(CBLAS_INC_DIR ${OPENBLAS_INC_DIR} ${OPENBLAS_LAPACKE_INC_DIR})
set(CBLAS_LIBRARIES ${OPENBLAS_LIB})

add_definitions(-DPADDLE_USE_OPENBLAS)
add_definitions(-DLAPACK_FOUND)

message(STATUS "Found OpenBLAS (include: ${OPENBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})")
message(STATUS "Found lapack in OpenBLAS (include: ${OPENBLAS_LAPACKE_INC_DIR})")
file(READ "${OPENBLAS_INC_DIR}/openblas_config.h" config_file)
string(REGEX MATCH "OpenBLAS ([0-9]+\.[0-9]+\.[0-9]+)" tmp ${config_file})
string(REGEX MATCH "([0-9]+\.[0-9]+\.[0-9]+)" ver ${tmp})

if (${ver} VERSION_EQUAL "0.3.7")
set(CBLAS_PROVIDER OPENBLAS)
set(CBLAS_INC_DIR ${OPENBLAS_INC_DIR} ${OPENBLAS_LAPACKE_INC_DIR})
set(CBLAS_LIBRARIES ${OPENBLAS_LIB})

add_definitions(-DPADDLE_USE_OPENBLAS)
add_definitions(-DLAPACK_FOUND)

message(STATUS "Found OpenBLAS (include: ${OPENBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})")
message(STATUS "Found lapack in OpenBLAS (include: ${OPENBLAS_LAPACKE_INC_DIR})")
endif()
endif()
endif()

Expand Down
8 changes: 8 additions & 0 deletions cmake/configure.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,14 @@ elseif(WITH_ROCM)
add_definitions(-DPADDLE_WITH_HIP)
add_definitions(-DEIGEN_USE_GPU)
add_definitions(-DEIGEN_USE_HIP)

if(NOT MIOPEN_FOUND)
message(FATAL_ERROR "Paddle needs MIOpen to compile")
endif()

if(${MIOPEN_VERSION} VERSION_LESS 2090)
message(FATAL_ERROR "Paddle needs MIOPEN >= 2.9 to compile")
endif()
else()
add_definitions(-DHPPL_STUB_FUNC)
list(APPEND CMAKE_CXX_SOURCE_FILE_EXTENSIONS cu)
Expand Down
45 changes: 25 additions & 20 deletions cmake/cuda.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,23 @@ function(select_nvcc_arch_flags out_variable)
if(${CUDA_ARCH_NAME} STREQUAL "Kepler")
set(cuda_arch_bin "30 35")
elseif(${CUDA_ARCH_NAME} STREQUAL "Maxwell")
set(cuda_arch_bin "50")
if (WITH_NV_JETSON)
set(cuda_arch_bin "53")
else()
set(cuda_arch_bin "50")
endif()
elseif(${CUDA_ARCH_NAME} STREQUAL "Pascal")
set(cuda_arch_bin "60 61")
if (WITH_NV_JETSON)
set(cuda_arch_bin "62")
else()
set(cuda_arch_bin "60 61")
endif()
elseif(${CUDA_ARCH_NAME} STREQUAL "Volta")
set(cuda_arch_bin "70")
if (WITH_NV_JETSON)
set(cuda_arch_bin "72")
else()
set(cuda_arch_bin "70")
endif()
elseif(${CUDA_ARCH_NAME} STREQUAL "Turing")
set(cuda_arch_bin "75")
elseif(${CUDA_ARCH_NAME} STREQUAL "Ampere")
Expand Down Expand Up @@ -205,23 +217,16 @@ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-extended-lambda")
if(WIN32)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler \"/wd4244 /wd4267 /wd4819 \"")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler /bigobj")
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
# match the cl's _ITERATOR_DEBUG_LEVEL
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler \"-g -G -D_DEBUG\"")
if(MSVC_STATIC_CRT)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler /MTd")
else()
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler /MDd")
endif()
elseif(CMAKE_BUILD_TYPE STREQUAL "Release")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler \"-DNDEBUG\"")
if(MSVC_STATIC_CRT)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler /MT")
else()
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler /MD")
endif()
else()
message(FATAL "Windows only support Release or Debug build now. Please set visual studio build type to Release/Debug, x64 build.")
if(MSVC_STATIC_CRT)
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG} -Xcompiler /MTd")
set(CMAKE_CUDA_FLAGS_RELEASE "${CMAKE_CUDA_FLAGS_RELEASE} -Xcompiler /MT")
foreach(flag_var
CMAKE_CUDA_FLAGS CMAKE_CUDA_FLAGS_DEBUG CMAKE_CUDA_FLAGS_RELEASE
CMAKE_CUDA_FLAGS_MINSIZEREL CMAKE_CUDA_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "-MD")
string(REGEX REPLACE "-MD" "-MT" ${flag_var} "${${flag_var}}")
endif()
endforeach(flag_var)
endif()
endif()

Expand Down
1 change: 1 addition & 0 deletions cmake/external/boost.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ ExternalProject_Add(
${BOOST_PROJECT}
${EXTERNAL_PROJECT_LOG_ARGS}
"${BOOST_DOWNLOAD_CMD}"
URL_MD5 f891e8c2c9424f0565f0129ad9ab4aff
PREFIX ${BOOST_PREFIX_DIR}
DOWNLOAD_DIR ${BOOST_SOURCE_DIR}
SOURCE_DIR ${BOOST_SOURCE_DIR}
Expand Down
8 changes: 4 additions & 4 deletions cmake/external/mkldnn.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ SET(MKLDNN_SOURCE_DIR ${THIRD_PARTY_PATH}/mkldnn/src/extern_mkldnn)
SET(MKLDNN_INSTALL_DIR ${THIRD_PARTY_PATH}/install/mkldnn)
SET(MKLDNN_INC_DIR "${MKLDNN_INSTALL_DIR}/include" CACHE PATH "mkldnn include directory." FORCE)
SET(MKLDNN_REPOSITORY ${GIT_URL}/oneapi-src/oneDNN.git)
SET(MKLDNN_TAG f3999b71d8e4415c1985a0dfb812a3ed77ee21fa)
SET(MKLDNN_TAG 748528a2d3204b5f401c14a9aacdec16accd5ead)


# Introduce variables:
Expand Down Expand Up @@ -60,8 +60,8 @@ ExternalProject_Add(
DEPENDS ${MKLDNN_DEPENDS}
PREFIX ${MKLDNN_PREFIX_DIR}
SOURCE_DIR ${MKLDNN_SOURCE_DIR}
BUILD_ALWAYS 1
# UPDATE_COMMAND ""
UPDATE_COMMAND ""
#BUILD_ALWAYS 1
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
Expand Down Expand Up @@ -110,7 +110,7 @@ if(WIN32)
add_custom_command(TARGET ${MKLDNN_PROJECT} POST_BUILD VERBATIM
COMMAND echo EXPORTS >> ${MKLDNN_INSTALL_DIR}/bin/mkldnn.def)
add_custom_command(TARGET ${MKLDNN_PROJECT} POST_BUILD VERBATIM
COMMAND for /f "skip=19 tokens=4" %A in (${MKLDNN_INSTALL_DIR}/bin/exports.txt) do echo %A >> ${MKLDNN_INSTALL_DIR}/bin/mkldnn.def)
COMMAND echo off && (for /f "skip=19 tokens=4" %A in (${MKLDNN_INSTALL_DIR}/bin/exports.txt) do echo %A >> ${MKLDNN_INSTALL_DIR}/bin/mkldnn.def) && echo on)
add_custom_command(TARGET ${MKLDNN_PROJECT} POST_BUILD VERBATIM
COMMAND lib /def:${MKLDNN_INSTALL_DIR}/bin/mkldnn.def /out:${MKLDNN_INSTALL_DIR}/bin/mkldnn.lib /machine:x64)
else(WIN32)
Expand Down
3 changes: 3 additions & 0 deletions cmake/external/mklml.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}" "${MKLML_ROOT}/lib")
IF(WIN32)
SET(MKLML_VER "mklml_win_2019.0.5.20190502" CACHE STRING "" FORCE)
SET(MKLML_URL "https://paddlepaddledeps.bj.bcebos.com/${MKLML_VER}.zip" CACHE STRING "" FORCE)
SET(MKLML_URL_MD5 ff8c5237570f03eea37377ccfc95a08a)
SET(MKLML_LIB ${MKLML_LIB_DIR}/mklml.lib)
SET(MKLML_IOMP_LIB ${MKLML_LIB_DIR}/libiomp5md.lib)
SET(MKLML_SHARED_LIB ${MKLML_LIB_DIR}/mklml.dll)
Expand All @@ -33,6 +34,7 @@ ELSE()
# Now enable csrmm function in mklml library temporarily, it will be updated as offical version later.
SET(MKLML_VER "csrmm_mklml_lnx_2019.0.5" CACHE STRING "" FORCE)
SET(MKLML_URL "http://paddlepaddledeps.bj.bcebos.com/${MKLML_VER}.tgz" CACHE STRING "" FORCE)
SET(MKLML_URL_MD5 bc6a7faea6a2a9ad31752386f3ae87da)
SET(MKLML_LIB ${MKLML_LIB_DIR}/libmklml_intel.so)
SET(MKLML_IOMP_LIB ${MKLML_LIB_DIR}/libiomp5.so)
SET(MKLML_SHARED_LIB ${MKLML_LIB_DIR}/libmklml_intel.so)
Expand All @@ -52,6 +54,7 @@ ExternalProject_Add(
${MKLML_PROJECT}
${EXTERNAL_PROJECT_LOG_ARGS}
"${MKLML_DOWNLOAD_CMD}"
URL_MD5 ${MKLML_URL_MD5}
PREFIX ${MKLML_PREFIX_DIR}
DOWNLOAD_DIR ${MKLML_SOURCE_DIR}
SOURCE_DIR ${MKLML_SOURCE_DIR}
Expand Down
51 changes: 51 additions & 0 deletions cmake/external/rocksdb.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

INCLUDE(ExternalProject)

SET(ROCKSDB_SOURCES_DIR ${THIRD_PARTY_PATH}/rocksdb)
SET(ROCKSDB_INSTALL_DIR ${THIRD_PARTY_PATH}/install/rocksdb)
SET(ROCKSDB_INCLUDE_DIR "${ROCKSDB_INSTALL_DIR}/include" CACHE PATH "rocksdb include directory." FORCE)
SET(ROCKSDB_LIBRARIES "${ROCKSDB_INSTALL_DIR}/lib/librocksdb.a" CACHE FILEPATH "rocksdb library." FORCE)
SET(ROCKSDB_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
INCLUDE_DIRECTORIES(${ROCKSDB_INCLUDE_DIR})

ExternalProject_Add(
extern_rocksdb
${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${ROCKSDB_SOURCES_DIR}
GIT_REPOSITORY "https://github.com/facebook/rocksdb"
GIT_TAG v6.10.1
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-DWITH_BZ2=OFF
-DWITH_GFLAGS=OFF
-DCMAKE_CXX_FLAGS=${ROCKSDB_CMAKE_CXX_FLAGS}
-DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
# BUILD_BYPRODUCTS ${ROCKSDB_SOURCES_DIR}/src/extern_rocksdb/librocksdb.a
INSTALL_COMMAND mkdir -p ${ROCKSDB_INSTALL_DIR}/lib/
&& cp ${ROCKSDB_SOURCES_DIR}/src/extern_rocksdb/librocksdb.a ${ROCKSDB_LIBRARIES}
&& cp -r ${ROCKSDB_SOURCES_DIR}/src/extern_rocksdb/include ${ROCKSDB_INSTALL_DIR}/
BUILD_IN_SOURCE 1
)

ADD_DEPENDENCIES(extern_rocksdb snappy)

ADD_LIBRARY(rocksdb STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET rocksdb PROPERTY IMPORTED_LOCATION ${ROCKSDB_LIBRARIES})
ADD_DEPENDENCIES(rocksdb extern_rocksdb)

LIST(APPEND external_project_dependencies rocksdb)

6 changes: 3 additions & 3 deletions cmake/external/warpctc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ SET(WARPCTC_INSTALL_DIR ${THIRD_PARTY_PATH}/install/warpctc)
# in case of low internet speed
#set(WARPCTC_REPOSITORY https://gitee.com/tianjianhe/warp-ctc.git)
set(WARPCTC_REPOSITORY ${GIT_URL}/baidu-research/warp-ctc.git)
set(WARPCTC_TAG c690fc5755abbdbdc98ef78d51ec10a6748a8cd1)
set(WARPCTC_TAG 37ece0e1bbe8a0019a63ac7e6462c36591c66a5b)

SET(WARPCTC_INCLUDE_DIR "${WARPCTC_INSTALL_DIR}/include"
CACHE PATH "Warp-ctc Directory" FORCE)
Expand Down Expand Up @@ -100,9 +100,9 @@ else()
"${WARPCTC_DOWNLOAD_CMD}"
PREFIX ${WARPCTC_PREFIX_DIR}
SOURCE_DIR ${WARPCTC_SOURCE_DIR}
#UPDATE_COMMAND ""
UPDATE_COMMAND ""
PATCH_COMMAND ""
BUILD_ALWAYS 1
#BUILD_ALWAYS 1
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-DCMAKE_C_FLAGS=${WARPCTC_C_FLAGS}
Expand Down
Loading