Fix int32 overflow in cuda kernel loop #38007

zhiqiu · 2021-12-09T09:03:29Z

Bug fixes

OPs

fix int32 overflow in cuda kernel loop

Given

Op(label_smooth): Inputs: X{auto_241_[LoDTensor<float, CUDAPlace(0), (40, 128, 250027)>]}, Outputs: Out{auto_242_[LoDTensor<NOT_INITED>]}

a cuda error 700 error may be raised:

paddle-bot-old · 2021-12-09T09:03:58Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

pangyoki

LGTM

fix int32 overflow in cuda kernel loop

9ebe930

zhiqiu force-pushed the dev/fix_label_smooth branch from f74f639 to 9ebe930 Compare December 10, 2021 03:27

pangyoki approved these changes Dec 10, 2021

View reviewed changes

zhiqiu merged commit 37f43eb into PaddlePaddle:develop Dec 10, 2021

Provide feedback