Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge develop #3

Merged
merged 136 commits into from
Apr 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
4b5cb22
[Rocm] fix python test of multinomial (#32158)
Ray2020BD Apr 12, 2021
0624ea5
polish custom api content for performence (#32209)
chenwhql Apr 12, 2021
4a09c1a
run the sample codes added by `add_sample_code` in ops.py (#31863)
wadefelix Apr 13, 2021
fdf63b4
optimize check_finite_and_unscale_op by fused kernel, test=develop (#…
thisjiang Apr 13, 2021
693c762
[ROCM] fix depth conv2d in rocm, test=develop (#32170)
qili93 Apr 13, 2021
6e946e9
add layer.to api (#32040)
MingMingShangTian Apr 13, 2021
7ab47e8
Fix prec on windows for long args (#32218)
XieYunshen Apr 13, 2021
1d5d3e4
add statistics_UT_resource.sh for imporving UT parallel level (#32220)
zhwesky2010 Apr 13, 2021
b9e543f
upgrade to oneDNN2.2.1 (fix when prim descriptor or attr contain NaN)…
lidanqing-intel Apr 13, 2021
cb81826
extend multiclass_nms unittest timeout threshold (#32214)
cryoco Apr 13, 2021
4281eb4
add new post-quant methods (#32208)
XGZhang11 Apr 14, 2021
f4b2ce4
fix expand op lack of float16 (#32238)
HexToString Apr 14, 2021
95939b5
add common dtypes as paddle's dtypes (#32012)
Apr 14, 2021
279b653
Add model benchmark ci (#32247)
xiegegege Apr 14, 2021
995b5f2
fix matrix_inverse_op with rocm (#32128)
Ray2020BD Apr 14, 2021
22ea4c3
Delete grpc.cmake/distribeted/distributed_ops (#32166)
tianshuo78520a Apr 14, 2021
f3e49c4
Fix rocm cmake (#32230)
qili93 Apr 14, 2021
7ba85ac
Add inner register backward hook method for Tensor (#32171)
chenwhql Apr 14, 2021
8552a18
[Paddle-TRT] Add check for TRT runtime dynamic shape (#32155)
cryoco Apr 14, 2021
63abd50
softmax reconstruction and optimization (#31821)
xingfeng01 Apr 14, 2021
7b9fcac
add marco cond for multi function (#32239)
chenwhql Apr 14, 2021
3ac6c18
adds new CPU kernel for SGD op supporting BF16 data type (#32162)
arogowie-intel Apr 14, 2021
3a804a0
Added oneDNN reduce_op FWD kernel (#31816)
jakpiase Apr 14, 2021
7da4455
support the bool tensor and scalar (#32272)
wawltor Apr 14, 2021
5dc0a6e
Optimize of backward of log_softmax when axis is -1 and dim_size <= 1…
AshburnLee Apr 14, 2021
69d8027
Optimize the bec_loss op to avoid copy input back to CPU. (#32265)
Xreki Apr 14, 2021
e6bc358
【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197)
frankwhzhang Apr 15, 2021
0c037d2
fix test sync_with_cpp (#32212)
fangshuixun007 Apr 15, 2021
29f6522
Customizable Python Layer in Dygraph (#32130)
hbwx24 Apr 15, 2021
f946ba6
Fix some error message (#32169)
Kqnonrime Apr 15, 2021
cfdde0e
【Deepmd Support】add IsInitialized and tanh double grad (#32188)
JiabinYang Apr 15, 2021
668a0d3
support int for nearest_interp, test=develop (#32270)
tink2123 Apr 15, 2021
9f8c8f9
heterps support pscore (#32093)
Thunderbrook Apr 15, 2021
90133d2
[ROCM] bugfix for unit tests (#32258)
windstamp Apr 15, 2021
825d495
Correct typos (#32288)
AshburnLee Apr 15, 2021
a8c3a90
tree-based-model (#31696)
123malin Apr 15, 2021
fabdb43
Update hapi to support AMP (#31417)
LiuChiachi Apr 15, 2021
6da043e
support ernie trt-int8 for inference (#32232)
ceci3 Apr 16, 2021
03c9ecd
test=develop, fix index_wrapper's cmake depends(#32314)
123malin Apr 16, 2021
66d4622
[Hybrid Parallel] Add model parallel support in dygraph (#32248)
ForFishes Apr 16, 2021
2c18258
Unify the implementation of elementwise operation of same dimensions …
ZzSean Apr 18, 2021
21dc044
update `get_api_md5`, using the real api name as the map's key (#32224)
wadefelix Apr 19, 2021
76cb83e
Add BF16 Constant Initializer and support for other initializer (#31…
wozna Apr 19, 2021
4d69eea
Fix sublayer (#31824)
JiabinYang Apr 19, 2021
ffd4086
[Hybrid Parallel] Support dp & mp in dygraph (#32323)
ForFishes Apr 19, 2021
cbe5c9f
[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc …
zhiqiu Apr 19, 2021
1e3a94b
add npu check nan and inf (#32340)
Baibaifan Apr 19, 2021
f0cc188
add log to analyse mkldnn models (#32342)
juncaipeng Apr 20, 2021
43926c8
support `numpy.array/asarray(tensor) -> ndarray`, test=develop (#32300)
lyuwenyu Apr 20, 2021
0dd28b8
fix the bug that the error message is not displayed on mac ci (#32367)
XieYunshen Apr 20, 2021
c09d645
[heterps] optimize build task (#32358)
Thunderbrook Apr 20, 2021
f6f59e5
move REGISTER_OP_CUDA_KERNEL into cpp with eigen, test=develop (#32114)
Avin0323 Apr 20, 2021
e0a52fd
save/load program (#32336)
hbwx24 Apr 20, 2021
e348901
[Sharding]: update config DOC (#32299)
JZ-LIANG Apr 20, 2021
186682f
add paddle.nn.unfold #32297 (#32298)
lzzyzlbb Apr 20, 2021
5e7e7c9
[Optimize]SparseKV speedup and memory save (#32048)
seiriosPlus Apr 20, 2021
1593ee2
remove fluid for auto_checkpoint. (#32157)
xiemoyuan Apr 21, 2021
ead8342
Added oneDNN reduce_op GRAD kernel (#32280)
jakpiase Apr 21, 2021
229f930
add retry on gcda_clean.py (#32318)
XieYunshen Apr 21, 2021
a2cbbe8
Modify the exit code of mac CI approval error (#32389)
iducn Apr 21, 2021
4898c38
add test=develop (#32380)
gongweibao Apr 21, 2021
5d19f8d
Added bilinear and nearest interp v2 oneDNN FP32 kernels (#32312)
jakpiase Apr 21, 2021
9ff8556
flush denormal in the tracer op, test=develop (#32350)
Shixiaowei02 Apr 21, 2021
2194ad1
[Kunlun]add collective ops for multi XPU cards training and add Kunlu…
vslyu Apr 21, 2021
ab6f874
remove thrust include files (#32395)
Avin0323 Apr 21, 2021
8e4c193
[NPU] register npu finalize on exit (#32390)
zhiqiu Apr 21, 2021
2b68d20
optimize get-feat function of graph engine (#32261)
seemingwang Apr 21, 2021
37bb334
add get_loss_scaling to fleet (#32401)
FeixLiu Apr 21, 2021
3da2c7f
Update the error info for quantizaion (#32273)
juncaipeng Apr 21, 2021
661a1f6
[CustomOP]Support find include/c++/v1 include dirs automatically (#3…
Aurelius84 Apr 21, 2021
7bae5e9
[CustomOp]Fix MAC3-CI random failed with XXX_setup.py(#32369)
Aurelius84 Apr 21, 2021
4be3b05
fix bug in amp O2 (#32343)
huangxu96 Apr 21, 2021
bc90916
Do not define and save reserve_space for inference. (#32375)
Xreki Apr 21, 2021
c315852
【NPU】Merge NPU ccl code (#32381)
frankwhzhang Apr 21, 2021
b47dd15
[HotFix] Add support for optimizer with varbase input (#32362)
chenwhql Apr 21, 2021
bf0ec9b
Add Bfloat16 support on Ampere GPU with CUDA 11 (#32132)
AshburnLee Apr 21, 2021
e58c705
Delete WITH_GRPC flag and Distributed old code (#32383)
tianshuo78520a Apr 22, 2021
f4d9adc
support save/load binary format tensor. (#32211)
hbwx24 Apr 22, 2021
73d0b0e
fix count problem (#32415)
seemingwang Apr 22, 2021
e727820
strip after compilation (#32145)
Avin0323 Apr 22, 2021
b2ee838
add glu in nn.functional (#32096)
Apr 22, 2021
7ea999f
[HybridParallel] Add ClipGradByGlobalNorm & check_finite_and_unscale …
ForFishes Apr 22, 2021
bec4b16
fix type(x)=paddle.VarBase to paddle.Tensor (#32364)
zhiboniu Apr 22, 2021
1064f2b
modify conv2d_transpose docs (#32410)
wangxinxin08 Apr 22, 2021
890d6bc
Modify some contents for elementwise op impl (#32414)
ZzSean Apr 22, 2021
f12c943
import sequence_* API to new namespace (#32089)
Apr 22, 2021
d03b0b1
Add fleet get_loss_scaling doc and update alert message (#32419)
FeixLiu Apr 22, 2021
c481570
fix doc for adamw (#32438)
hutuxian Apr 22, 2021
a1a527f
[NPU] remove ascend_parser for WITH_ASCEND_CL (#32451)
zhiqiu Apr 22, 2021
c332828
support int32 and int64 kernel for clip operator (#32373)
wuyefeilin Apr 22, 2021
f8ca5a9
Add `paddle.set_grad_enabled` (#31794)
willthefrog Apr 22, 2021
203ac4f
Fix seven error message (#32397)
Kqnonrime Apr 23, 2021
49773f3
[NPU] Fix bug that epsilon become 0 using power (#32469)
zhiqiu Apr 23, 2021
7879477
[ROCM] add cuda kenrel for batch_norm_op (#32393)
ronny1996 Apr 23, 2021
1dc8393
disable utest (#32474)
ForFishes Apr 23, 2021
51bcd97
add WITH_STRIP=ON in paddle_build.sh, test=develop (#32450)
Avin0323 Apr 23, 2021
b6f8ccd
add lstm support on xpu test=kunlun (#32436)
shanliang1992 Apr 23, 2021
2b108a0
add c_concat and c_split ops (#32486)
Apr 23, 2021
0e74eea
solve hccl communicate conflict (#32447)
Baibaifan Apr 23, 2021
7a681f0
fix Windows CI MP compile and environment install script and openblas…
zhwesky2010 Apr 23, 2021
1b83de2
update 2.0 public api in optimizer (#31944)
zhiboniu Apr 23, 2021
7c38114
move semantic checks to op_teller (#32279)
b3602sss Apr 23, 2021
a01b510
ernie int8 support trt6 (#32424)
ceci3 Apr 23, 2021
39a59dc
[NPU] refactor check_finite_and_scale npu kernel (#32407)
zhiqiu Apr 23, 2021
faa8c70
Polish ParallelExectuor constructor into small functions (#32191)
Aurelius84 Apr 23, 2021
de94743
Ut test conv3d op timeout (#32216)
XieYunshen Apr 23, 2021
8fa8a37
add the c_identity op (#32485)
Apr 23, 2021
7d4998a
[CustomOp] Remove useless extension headers for old custom op (#32463)
chenwhql Apr 23, 2021
8beb170
add tensor.tolist() support (#32366)
zhiboniu Apr 24, 2021
9bf9092
Fix test_yolov3 Random Failure (#32496)
zhhsplendid Apr 24, 2021
18d3e2c
refator paddle inference c api.test=develop (#32225)
winter-wang Apr 24, 2021
f8caa58
clear CUDA compile environment on windows (#32498)
zhwesky2010 Apr 24, 2021
ef8671e
print the real name for Functions instead of the ArgSpec (#32379)
wadefelix Apr 24, 2021
feb2e47
Nne integration (#32255)
denglin-github Apr 25, 2021
83580ee
use 'paddle.framework.set_grad_enabled' in pylayer (#32355)
hbwx24 Apr 25, 2021
136ef09
add detail for gpu_id, document_fix (#32444)
zhangting2020 Apr 25, 2021
fb7590d
[NPU] refine lookup_table_v2_grad npu_kernel (#32497)
zhiqiu Apr 25, 2021
4db2cc9
fix reader_blocking_queue_test (#32505)
chenwhql Apr 25, 2021
3b61d06
fix tensor to_string when shape contains zero (#32501)
zhiqiu Apr 25, 2021
06276f4
let paddle.utils.install_check support CPU package with GPU device (#…
pangyoki Apr 25, 2021
f272e59
fix tc trt shape (#32458)
shangzhizhou Apr 25, 2021
78eff52
[BUG FIX] when x.dim < y.dim, the result of compare_op is inverse (#…
wawltor Apr 25, 2021
976fe6f
Fix the bug in mp (#31996)
Apr 25, 2021
7ef1de6
[HybridParallel] Add pipeline layer in dygraph (#32449)
ForFishes Apr 25, 2021
2f351ed
add silu op, test=develop (#32384)
minghaoBD Apr 25, 2021
3b4dcad
[ROCM] update PADDLE_WITH_ROCM to PADDLE_WITH_HIP, test=develop (#32487)
qili93 Apr 25, 2021
7a4cbb3
update 2.0 public api in io&reader (#32022)
zhiboniu Apr 25, 2021
486946a
support python3.9 in paddle_build (#32503)
pangyoki Apr 25, 2021
92dc9b2
update lite subgraph api. (#32513)
jiweibo Apr 25, 2021
4e460d7
Add hub Module for easy to use pre-trained models. (#31873)
lyuwenyu Apr 25, 2021
74824fd
add clearGradient for amp sample code (#32517)
zhiqiu Apr 25, 2021
727b28d
paddle.save/load support nested structure and layer (#32446)
hbwx24 Apr 25, 2021
1896c77
fix gradient(nan) when two inputs are equal (#32448)
zhangting2020 Apr 25, 2021
541d702
add trt verbose logs (#32459)
cryoco Apr 25, 2021
b055676
[Paddle-TRT] Add trt runtime version check (#32443)
cryoco Apr 25, 2021
5943ff7
add copy_cross_scope (#32432)
Baibaifan Apr 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
50 changes: 32 additions & 18 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License

cmake_minimum_required(VERSION 3.15)
cmake_minimum_required(VERSION 3.10)
cmake_policy(VERSION 3.10)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
Expand All @@ -22,9 +22,6 @@ include(system)

project(paddle CXX C)

include(init)
include(generic) # simplify cmake module

# enable language CUDA
# TODO(Shibo Tao): remove find_package(CUDA) completely.
find_package(CUDA QUIET)
Expand All @@ -33,16 +30,24 @@ option(WITH_TENSORRT "Compile PaddlePaddle with NVIDIA TensorRT" OFF)
option(WITH_XPU "Compile PaddlePaddle with BAIDU KUNLUN XPU" OFF)
option(WITH_WIN_DUMP_DBG "Compile with windows core dump debug mode" OFF)
option(WITH_ASCEND "Compile PaddlePaddle with ASCEND" OFF)
# NOTE(zhiqiu): WITH_ASCEND_CL can be compile on x86_64, so we can set WITH_ASCEND=OFF and WITH_ASCEND_CL=ON
option(WITH_ROCM "Compile PaddlePaddle with ROCM platform" OFF)
# NOTE(zhiqiu): WITH_ASCEND_CL can be compile on x86_64, so we can set WITH_ASCEND=OFF and WITH_ASCEND_CL=ON
# to develop some acl related functionality on x86
option(WITH_ASCEND_CL "Compile PaddlePaddle with ASCEND CL" ${WITH_ASCEND})
option(WITH_ASCEND_CXX11 "Compile PaddlePaddle with ASCEND and CXX11 ABI" OFF)
# Note(zhouwei): It use option above, so put here
include(init)
include(generic) # simplify cmake module

if (WITH_GPU AND WITH_XPU)
message(FATAL_ERROR "Error when compile GPU and XPU at the same time")
endif()
if (WITH_GPU AND WITH_ASCEND)
message(FATAL_ERROR "Error when compile GPU and ASCEND at the same time")
endif()
if (WITH_GPU AND WITH_ROCM)
message(FATAL_ERROR "Error when compile CUDA and ROCM at the same time")
endif()

if(WITH_GPU AND NOT APPLE)
enable_language(CUDA)
Expand All @@ -61,7 +66,7 @@ if(WITH_MUSL)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations -Wno-deprecated-declarations -Wno-error=pessimizing-move -Wno-error=deprecated-copy")
endif()

if(WITH_ASCEND AND NOT WITH_ASCEND_CXX11)
if(WITH_ASCEND_CL AND NOT WITH_ASCEND_CXX11)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
endif()

Expand Down Expand Up @@ -99,9 +104,11 @@ if(WIN32)
endif()
endforeach(flag_var)
endif()

# NOTE(Avin0323): Less parallel count result in faster compilation.
math(EXPR PROCESS_MAX "${CPU_CORES} * 2 / 3")

# NOTE(zhouwei25): temporarily change MP to 1 for reducing CPU & memory utilization
set(PROCESS_MAX 1)
#math(EXPR PROCESS_MAX "${CPU_CORES} * 1 / 2")

# windows build turn off warnings, use parallel compiling.
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
Expand Down Expand Up @@ -129,6 +136,9 @@ if(WIN32)

foreach(flag_var CMAKE_SHARED_LINKER_FLAGS CMAKE_STATIC_LINKER_FLAGS CMAKE_EXE_LINKER_FLAGS CMAKE_LINKER_FLAGS)
set(${flag_var} "${${flag_var}} /ignore:4049 /ignore:4217 /ignore:4006 /ignore:4221")
if(MSVC_STATIC_CRT)
set(${flag_var} "${${flag_var}} /NODEFAULTLIB:MSVCRT.LIB")
endif()
endforeach(flag_var)

if (WITH_WIN_DUMP_DBG)
Expand Down Expand Up @@ -168,8 +178,6 @@ option(WITH_DISTRIBUTE "Compile with distributed support" OFF)
option(WITH_BRPC_RDMA "Use brpc rdma as the rpc protocal" OFF)
option(ON_INFER "Turn on inference optimization and inference-lib generation" OFF)
################################ Internal Configurations #######################################
option(WITH_ROCM "Compile PaddlePaddle with ROCM platform" OFF)
option(WITH_RCCL "Compile PaddlePaddle with RCCL support" OFF)
option(WITH_NV_JETSON "Compile PaddlePaddle with NV JETSON" OFF)
option(WITH_PROFILER "Compile PaddlePaddle with GPU profiler and gperftools" OFF)
option(WITH_COVERAGE "Compile PaddlePaddle with code coverage" OFF)
Expand All @@ -180,21 +188,23 @@ option(WITH_PSLIB "Compile with pslib support" OFF)
option(WITH_BOX_PS "Compile with box_ps support" OFF)
option(WITH_XBYAK "Compile with xbyak support" ON)
option(WITH_CONTRIB "Compile the third-party contributation" OFF)
option(WITH_GRPC "Use grpc as the default rpc framework" ${WITH_DISTRIBUTE})
option(WITH_PSCORE "Compile with parameter server support" ${WITH_DISTRIBUTE})
option(WITH_HETERPS "Compile with heterps" OFF})
option(WITH_INFERENCE_API_TEST "Test fluid inference C++ high-level api interface" OFF)
option(PY_VERSION "Compile PaddlePaddle with python3 support" ${PY_VERSION})
option(WITH_DGC "Use DGC(Deep Gradient Compression) or not" ${WITH_DISTRIBUTE})
option(SANITIZER_TYPE "Choose the type of sanitizer, options are: Address, Leak, Memory, Thread, Undefined" OFF)
option(WITH_LITE "Compile Paddle Fluid with Lite Engine" OFF)
option(WITH_NCCL "Compile PaddlePaddle with NCCL support" ON)
option(WITH_RCCL "Compile PaddlePaddle with RCCL support" ON)
option(WITH_XPU_BKCL "Compile PaddlePaddle with BAIDU KUNLUN XPU BKCL" OFF)
option(WITH_CRYPTO "Compile PaddlePaddle with crypto support" ON)
option(WITH_ARM "Compile PaddlePaddle with arm support" OFF)
option(WITH_SW "Compile PaddlePaddle with sw support" OFF)
option(WITH_MIPS "Compile PaddlePaddle with mips support" OFF)
option(WITH_MUSL "Compile with musl libc instead of gblic" OFF)
option(WITH_UNITY_BUILD "Compile with UnityBuild mode" OFF)
option(WITH_STRIP "Strip so files of Whl packages" OFF)

# PY_VERSION
if(NOT PY_VERSION)
Expand Down Expand Up @@ -255,9 +265,6 @@ endif()

if(WITH_BRPC_RDMA)
message(STATUS "Use brpc with rdma.")
if(WITH_GRPC)
message(FATAL_ERROR "Can't use grpc with brpc rdma.")
endif()
if(NOT WITH_DISTRIBUTE)
message(FATAL_ERROR "Can't use brpc rdma in no distribute env.")
endif()
Expand Down Expand Up @@ -305,9 +312,9 @@ endif(WITH_ROCM)

if (NOT WITH_ROCM AND WITH_RCCL)
MESSAGE(WARNING
"Disable RCCL when compiling without GPU. Force WITH_RCCL=OFF.")
set(WITH_NCCL OFF CACHE STRING
"Disable RCCL when compiling without GPU" FORCE)
"Disable RCCL when compiling without ROCM. Force WITH_RCCL=OFF.")
set(WITH_RCCL OFF CACHE STRING
"Disable RCCL when compiling without ROCM" FORCE)
endif()

if(WITH_RCCL)
Expand Down Expand Up @@ -362,6 +369,13 @@ else()
message(WARNING "On inference mode, will take place some specific optimization. Turn on the ON_INFER flag when building inference_lib only.")
endif()

if(WITH_STRIP)
find_program(STRIP_PATH strip)
if(NOT STRIP_PATH OR NOT LINUX)
set(WITH_STRIP OFF CACHE STRING "Command strip is only used on Linux when it exists." FORCE)
endif()
endif()

add_subdirectory(paddle)
if(WITH_PYTHON)
add_subdirectory(python)
Expand Down
7 changes: 3 additions & 4 deletions cmake/configure.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,9 @@ if(WITH_PSCORE)
add_definitions(-DPADDLE_WITH_PSCORE)
endif()


if(WITH_GRPC)
add_definitions(-DPADDLE_WITH_GRPC)
endif(WITH_GRPC)
if(WITH_HETERPS)
add_definitions(-DPADDLE_WITH_HETERPS)
endif()

if(WITH_BRPC_RDMA)
add_definitions(-DPADDLE_WITH_BRPC_RDMA)
Expand Down
23 changes: 16 additions & 7 deletions cmake/external/ascend.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,13 @@ else()
set(ASCEND_DIR /usr/local/Ascend)
endif()

if(WITH_ASCEND)
if(EXISTS ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/include/graph/ascend_string.h)
# It means CANN 20.2 +
add_definitions(-DPADDLE_WITH_ASCEND_STRING)
endif()


if(WITH_ASCEND OR WITH_ASCEND_CL)
set(ASCEND_DRIVER_DIR ${ASCEND_DIR}/driver/lib64)
set(ASCEND_DRIVER_COMMON_DIR ${ASCEND_DIR}/driver/lib64/common)
set(ASCEND_DRIVER_SHARE_DIR ${ASCEND_DIR}/driver/lib64/share)
Expand All @@ -43,9 +49,6 @@ if(WITH_ASCEND)
set(atlas_acl_lib ${ATLAS_RUNTIME_DIR}/libascendcl.so)
INCLUDE_DIRECTORIES(${ATLAS_RUNTIME_INC_DIR})

if(EXISTS ${ATLAS_RUNTIME_INC_DIR}/graph/ascend_string.h)
add_definitions(-DPADDLE_WITH_ASCEND_STRING)
endif()

ADD_LIBRARY(ascend_ge SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend_ge PROPERTY IMPORTED_LOCATION ${atlas_ge_runner_lib})
Expand All @@ -62,17 +65,23 @@ endif()
if(WITH_ASCEND_CL)
set(ASCEND_CL_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64)

set(ascend_hccl_lib ${ASCEND_CL_DIR}/libhccl.so)
set(ascendcl_lib ${ASCEND_CL_DIR}/libascendcl.so)
set(acl_op_compiler_lib ${ASCEND_CL_DIR}/libacl_op_compiler.so)
set(ASCEND_CL_INC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/include)
set(FWKACLLIB_INC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/include)
set(ACLLIB_INC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/acllib/include)

message(STATUS "ASCEND_CL_INC_DIR ${ASCEND_CL_INC_DIR}")
message(STATUS "FWKACLLIB_INC_DIR ${FWKACLLIB_INC_DIR}")
message(STATUS "ASCEND_CL_DIR ${ASCEND_CL_DIR}")
INCLUDE_DIRECTORIES(${ASCEND_CL_INC_DIR})
INCLUDE_DIRECTORIES(${FWKACLLIB_INC_DIR})
INCLUDE_DIRECTORIES(${ACLLIB_INC_DIR})

ADD_LIBRARY(ascendcl SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascendcl PROPERTY IMPORTED_LOCATION ${ascendcl_lib})

ADD_LIBRARY(ascend_hccl SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend_hccl PROPERTY IMPORTED_LOCATION ${ascend_hccl_lib})

ADD_LIBRARY(acl_op_compiler SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET acl_op_compiler PROPERTY IMPORTED_LOCATION ${acl_op_compiler_lib})
add_custom_target(extern_ascend_cl DEPENDS ascendcl acl_op_compiler)
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/gloo.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ cache_third_party(extern_gloo
TAG ${GLOO_TAG}
DIR GLOO_SOURCE_DIR)

if(WITH_ASCEND)
if(WITH_ASCEND OR WITH_ASCEND_CL)
ExternalProject_Add(
extern_gloo
${EXTERNAL_PROJECT_LOG_ARGS}
Expand Down
77 changes: 0 additions & 77 deletions cmake/external/grpc.cmake

This file was deleted.

2 changes: 1 addition & 1 deletion cmake/external/mkldnn.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ SET(MKLDNN_SOURCE_DIR ${THIRD_PARTY_PATH}/mkldnn/src/extern_mkldnn)
SET(MKLDNN_INSTALL_DIR ${THIRD_PARTY_PATH}/install/mkldnn)
SET(MKLDNN_INC_DIR "${MKLDNN_INSTALL_DIR}/include" CACHE PATH "mkldnn include directory." FORCE)
SET(MKLDNN_REPOSITORY ${GIT_URL}/oneapi-src/oneDNN.git)
SET(MKLDNN_TAG 72efa005effb49595933e033cc732f215ef0445a)
SET(MKLDNN_TAG f58682cd8bd0615f41d879f8afc8f1511ab42d24)

# Introduce variables:
# * CMAKE_INSTALL_LIBDIR
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/protobuf.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ endif()
)
ENDFUNCTION()

if(WITH_ASCEND)
if(WITH_ASCEND OR WITH_ASCEND_CL)
SET(PROTOBUF_VERSION 3.8.0)
else()
SET(PROTOBUF_VERSION 3.1.0)
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/threadpool.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ INCLUDE(ExternalProject)

SET(THREADPOOL_PREFIX_DIR ${THIRD_PARTY_PATH}/threadpool)
SET(THREADPOOL_SOURCE_DIR ${THIRD_PARTY_PATH}/threadpool/src/extern_threadpool)
if(WITH_ASCEND)
if(WITH_ASCEND OR WITH_ASCEND_CL)
SET(THREADPOOL_REPOSITORY https://gitee.com/tianjianhe/ThreadPool.git)
else()
SET(THREADPOOL_REPOSITORY ${GIT_URL}/progschj/ThreadPool.git)
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/warpctc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cache_third_party(extern_warpctc
TAG ${WARPCTC_TAG}
DIR WARPCTC_SOURCE_DIR)

if(WITH_ASCEND)
if(WITH_ASCEND OR WITH_ASCEND_CL)
ExternalProject_Add(
extern_warpctc
${EXTERNAL_PROJECT_LOG_ARGS}
Expand Down
19 changes: 15 additions & 4 deletions cmake/generic.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -447,9 +447,20 @@ function(cc_test TARGET_NAME)
cc_test_build(${TARGET_NAME}
SRCS ${cc_test_SRCS}
DEPS ${cc_test_DEPS})
cc_test_run(${TARGET_NAME}
COMMAND ${TARGET_NAME}
ARGS ${cc_test_ARGS})
# we dont test hcom op, because it need complex configuration
# with more than one machine
if(NOT ("${TARGET_NAME}" STREQUAL "c_broadcast_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "c_allreduce_sum_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "c_allreduce_max_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "c_reducescatter_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "c_allgather_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "send_v2_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "c_reduce_sum_op_npu_test" OR
"${TARGET_NAME}" STREQUAL "recv_v2_op_npu_test"))
cc_test_run(${TARGET_NAME}
COMMAND ${TARGET_NAME}
ARGS ${cc_test_ARGS})
endif()
endif()
endfunction(cc_test)

Expand Down Expand Up @@ -807,7 +818,7 @@ function(py_test TARGET_NAME)
${PYTHON_EXECUTABLE} -u ${py_test_SRCS} ${py_test_ARGS}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
endif()

if (WIN32)
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 150)
endif()
Expand Down
4 changes: 2 additions & 2 deletions cmake/inference_lib.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -211,11 +211,11 @@ set(src_dir "${PADDLE_SOURCE_DIR}/paddle/fluid")
if(WIN32)
set(paddle_inference_c_lib $<TARGET_FILE_DIR:paddle_inference_c>/paddle_inference_c.*)
else(WIN32)
set(paddle_inference_c_lib ${PADDLE_BINARY_DIR}/paddle/fluid/inference/capi/libpaddle_inference_c.*)
set(paddle_inference_c_lib ${PADDLE_BINARY_DIR}/paddle/fluid/inference/capi_exp/libpaddle_inference_c.*)
endif(WIN32)

copy(inference_lib_dist
SRCS ${src_dir}/inference/capi/paddle_c_api.h ${paddle_inference_c_lib}
SRCS ${src_dir}/inference/capi_exp/pd_*.h ${paddle_inference_c_lib}
DSTS ${PADDLE_INFERENCE_C_INSTALL_DIR}/paddle/include ${PADDLE_INFERENCE_C_INSTALL_DIR}/paddle/lib)

# fluid library for both train and inference
Expand Down
4 changes: 2 additions & 2 deletions cmake/init.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ if(NOT WIN32)
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O2 -g -DNDEBUG")
set(CMAKE_CXX_FLAGS_MINSIZEREL "-Os -DNDEBUG")
else()
# It has not been used now, it can specify CUDA compile flag manualy,
# It can specify CUDA compile flag manualy,
# its use is to remvoe /Zi to reduce GPU static library size. But it's dangerous
# because CUDA will update by nvidia, then error will occur.
# Now, it's used in CUDA:[10.0, 10.2]
# Now, it's only used in VS2015 + CUDA:[10.0, 10.2]
set(WIN_PROPS ${CMAKE_SOURCE_DIR}/cmake/paddle_win.props)
endif()

Expand Down
Loading