Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

install ROCm has a mistake #107

Open
2017040264 opened this issue Dec 16, 2020 · 12 comments
Open

install ROCm has a mistake #107

2017040264 opened this issue Dec 16, 2020 · 12 comments

Comments

@2017040264
Copy link

ubuntu20.04 + Radeon Rx Vega10 Graphics.

/opt/rocm/bin/rocminfo has a mistake:
ROCk module is loaded
Unable to open /dev/kfd read-write: Bad address
cfl is member of render group
hsa api call failure at: /src/rocminfo/rocminfo.cc:1142
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

how can I fix it ?

@fxkamd
Copy link
Contributor

fxkamd commented Dec 16, 2020

This should probably be reported in https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver

Are there any error messages in the dmesg log?

@2017040264
Copy link
Author

2017040264 commented Dec 17, 2020 via email

@fxkamd
Copy link
Contributor

fxkamd commented Dec 17, 2020

Something is clearly going wrong during driver initialization at boot time. I cannot give you a diagnosis from a few hand-picked error messages. That usually leads to incorrect conclusions. Please provide a complete kernel log, which will include a lot more context to work with: kernel version, boot parameters, PCI device list, memory map, other errors you may have missed, etc.

Can you also provide the output of "dkms status"?

@2017040264
Copy link
Author

2017040264 commented Dec 17, 2020 via email

@xuhuisheng
Copy link

I suggest execute lspci -vt to show the information of GPU.
BTW: rx Vega10 means RX Vega 64 or APU? ROCm can't support APU yet.

@johnbridgman
Copy link

Also please attach a kernel log / dmesg output as fxkamd suggested.

@2017040264
Copy link
Author

2017040264 commented Dec 17, 2020 via email

@2017040264
Copy link
Author

2017040264 commented Dec 17, 2020 via email

@johnbridgman
Copy link

The dmseg log is in the enclosure. Please check it.

Thanks, but I don't see an attachment. Looks like you responded by email rather than the web page - it's possible that attachments via email don't get included. The web page dialog suggests that you have to drag & drop or paste attachments.

Looks like your GPU is the integrated GPU of a Picasso (3700U) so as fxkamd mentioned it's not officially supported under HIP yet. @fxkamd I think Picasso is the first APU where we used GPUVM code paths rather than ATC/IOMMU but I don't know if that helps at all.

@fxkamd
Copy link
Contributor

fxkamd commented Dec 17, 2020

Picasso is the same as Raven. It uses the IOMMUv2 code path by default. But we recently added fallbacks for systems with disabled IOMMUv2 or broken/missing CRAT tables where we treat it as a dGPU. I'm not sure whether that has made it into ROCm release branches yet.

The error message "/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")" comes from the HIP runtime. It looks like the GPU code was not compiled for the correct ISA for your GPU.

@2017040264
Copy link
Author

dmesg.log

Here is the dmesg log.

And whether my GPU will be supported in the future?

@2017040264
Copy link
Author

capture_20201218071612289

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants