transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure #43680

LukeLIN-web · 2022-06-20T10:40:03Z

bug描述 Describe the Bug

Codes:
https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/reinforcement_learning/actor_critic_method.html

paddle-gpu==2.3.0, cuda10.2,cudnn 7
pip install gym

It failed in training process:

Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion `in_data[id] >= 0.0` failed. The input of multinomial distribution should be >= 0, but got nan.
Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion `in_data[id] >= 0.0` failed. The input of multinomial distribution should be >= 0, but got nan.
Traceback (most recent call last):
  File "Actor-Critic.py", line 133, in <module>
    trainIters(actor, critic, n_iters=201)
  File "Actor-Critic.py", line 74, in trainIters
    action = dist.sample([1])
  File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/distribution/categorical.py", line 166, in sample
    self._logits_to_probs(logits), num_samples, True)
  File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/tensor/random.py", line 186, in multinomial
    replacement)
SystemError: (Fatal) Operator multinomial raises an thrust::system::system_error exception.
The exception content is
:transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure. (at /paddle/paddle/fluid/imperative/tracer.cc:307)

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

paddle-bot-old · 2022-06-20T10:40:30Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

Liu-xiandong · 2022-06-20T13:40:07Z

你好，看报错信息是由于multinomial的输入参数不符合规范。看你给出的code链接是paddle的官方case，请问是否有修改其他内容呢？比如数据或者参数之类的内容。可以再仔细看一下该部分的参数输入，https://github.com/PaddlePaddle/Paddle/blob/release/2.3/paddle/phi/kernels/gpu/multinomial_kernel.cu#L64

Aganlengzi · 2022-06-21T01:54:51Z

Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion in_data[id] >= 0.0 failed. The input of multinomial distribution should be >= 0, but got nan.
Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion in_data[id] >= 0.0 failed. The input of multinomial distribution should be >= 0, but got nan.
Traceback (most recent call last):

@LukeLIN-web 你好请注意报错显示输入的数据有nan值，所以抛出异常了

LukeLIN-web · 2022-06-21T02:20:45Z

你好，看报错信息是由于multinomial的输入参数不符合规范。看你给出的code链接是paddle的官方case，请问是否有修改其他内容呢？比如数据或者参数之类的内容。可以再仔细看一下该部分的参数输入，https://github.com/PaddlePaddle/Paddle/blob/release/2.3/paddle/phi/kernels/gpu/multinomial_kernel.cu#L64

没有修改任何内容, 我又重新复制了一遍, 还是同样错误

LukeLIN-web · 2022-06-21T02:21:14Z

Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion in_data[id] >= 0.0 failed. The input of multinomial distribution should be >= 0, but got nan.
Error: /paddle/paddle/phi/kernels/gpu/multinomial_kernel.cu:67 Assertion in_data[id] >= 0.0 failed. The input of multinomial distribution should be >= 0, but got nan.
Traceback (most recent call last):

@LukeLIN-web 你好请注意报错显示输入的数据有nan值，所以抛出异常了

输入是https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/reinforcement_learning/actor_critic_method.html 源代码,没有任何改动

Liu-xiandong · 2022-06-21T07:49:20Z

你好，我目前在paddle-gpu==2.3.0, cuda10.2,cudnn 7上并没有复现出你的问题，能否提供更多的信息呢，比如所使用的GPU卡型号、操作系统、CPU型号等相关硬件信息。

LukeLIN-web · 2022-06-21T08:44:50Z

我用镜像启动运行还是不行.
我是用 dockerhub的镜像.
启动后
Successfully installed cloudpickle-2.1.0 gym-0.24.1 gym-notices-0.0.7
GPU : Tesla T4,
OS : Linux. 4.15.0-180-generic #189-Ubuntu SMP Wed May 18 14:13:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
CPU : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

Liu-xiandong · 2022-06-21T09:32:06Z

你好，看了你的硬件参数，暂时无法判断出错的原因。建议你在不同的硬件上尝试一下，如果硬件资源不足，可以使用paddle的AI studio。

sunhao · 2023-02-20T16:25:53Z

遇到同样的问题，gpu\multinomial_kernel.cu:56 Assertion in_data[id] >= 0.0 failed. The input of multinomial distribution should be >= 0, but got nan.
get Nvidia's official solution and advice about CUDA Error.] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:259)
CUDA.11.6 paddlepaddle-gpu==2.4.1.post116 RTX 3070,

paddle-bot · 2024-02-27T06:32:06Z

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

LukeLIN-web added status/new-issue 新建 type/bug-report 报bug labels Jun 20, 2022

paddle-bot-old bot assigned Liu-xiandong Jun 20, 2022

Ligoml added status/following-up 跟进中 and removed status/new-issue 新建 labels Jun 21, 2022

paddle-bot bot closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure #43680

transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure #43680

LukeLIN-web commented Jun 20, 2022

paddle-bot-old bot commented Jun 20, 2022

Liu-xiandong commented Jun 20, 2022

Aganlengzi commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022

Liu-xiandong commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022 •

edited

Loading

Liu-xiandong commented Jun 21, 2022

sunhao commented Feb 20, 2023

paddle-bot bot commented Feb 27, 2024

transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure #43680

transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure #43680

Comments

LukeLIN-web commented Jun 20, 2022

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

paddle-bot-old bot commented Jun 20, 2022

Liu-xiandong commented Jun 20, 2022

Aganlengzi commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022

Liu-xiandong commented Jun 21, 2022

LukeLIN-web commented Jun 21, 2022 • edited Loading

Liu-xiandong commented Jun 21, 2022

sunhao commented Feb 20, 2023

paddle-bot bot commented Feb 27, 2024

LukeLIN-web commented Jun 21, 2022 •

edited

Loading