Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level-zero zello-world - Abort #377

Closed
jjfumero opened this issue Nov 12, 2020 · 6 comments
Closed

Level-zero zello-world - Abort #377

jjfumero opened this issue Nov 12, 2020 · 6 comments
Assignees
Labels

Comments

@jjfumero
Copy link

Question migrated from intel/llvm#2756

I am running level-zero from:

intel-llvm/llvm/build/tools/sycl/plugins/level_zero/level_zero/level_zero_loader
I am using GCC 9: gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2) on CentOS 7.8

When running the zello-world example, I get an abort:

Driver initialized.
Found Device::type_t::GPU device...
Driver version: 16795339
API version: Driver::api_version_t::_1_0
Device::properties_t::stype : structure_type_t::DEVICE_PROPERTIES
Device::properties_t::pNext : 0x0
Device::properties_t::type : Device::type_t::GPU
Device::properties_t::vendorId : 32902
Device::properties_t::deviceId : 22811
Device::properties_t::flags : Device::{ PROPERTY_FLAG_INTEGRATED }
Device::properties_t::subdeviceId : 0
Device::properties_t::coreClockRate : 1100
Device::properties_t::maxMemAllocSize : 4294959104
Device::properties_t::maxHardwareContexts : 22831408
Device::properties_t::maxCommandQueuePriority : 0
Device::properties_t::numThreadsPerEU : 7
Device::properties_t::physicalEUSimdWidth : 8
Device::properties_t::numEUsPerSubslice : 8
Device::properties_t::numSubslicesPerSlice : 3
Device::properties_t::numSlices : 1
Device::properties_t::timerResolution : 83
Device::properties_t::timestampValidBits : 36
Device::properties_t::kernelTimestampValidBits : 32
Device::properties_t::uuid : device_uuid_t::id : [ 134, 128, 0, 0, 27, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]

Device::properties_t::name : Intel(R) Gen9

Device::compute_properties_t::stype : structure_type_t::DEVICE_COMPUTE_PROPERTIES
Device::compute_properties_t::pNext : 0x0
Device::compute_properties_t::maxTotalGroupSize : 256
Device::compute_properties_t::maxGroupSizeX : 256
Device::compute_properties_t::maxGroupSizeY : 256
Device::compute_properties_t::maxGroupSizeZ : 256
Device::compute_properties_t::maxGroupCountX : 4294967295
Device::compute_properties_t::maxGroupCountY : 4294967295
Device::compute_properties_t::maxGroupCountZ : 4294967295
Device::compute_properties_t::maxSharedLocalMemory : 65536
Device::compute_properties_t::numSubGroupSizes : 3
Device::compute_properties_t::subGroupSizes : [ 8, 16, 32, 2, 0, 14, 2147483648, 0 ]

Device::memory_properties_t::stype : structure_type_t::DEVICE_MEMORY_PROPERTIES
Device::memory_properties_t::pNext : 0x0
Device::memory_properties_t::flags : Device::{ 0 }
Device::memory_properties_t::maxClockRate : 1100
Device::memory_properties_t::maxBusWidth : 64
Device::memory_properties_t::totalSize : 26604969984
Device::memory_properties_t::name : a

Device::memory_access_properties_t::stype : structure_type_t::DEVICE_MEMORY_ACCESS_PROPERTIES
Device::memory_access_properties_t::pNext : 0x0
Device::memory_access_properties_t::hostAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
Device::memory_access_properties_t::deviceAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
Device::memory_access_properties_t::sharedSingleDeviceAllocCapabilities : Device::{ MEMORY_ACCESS_CAP_FLAG_RW | MEMORY_ACCESS_CAP_FLAG_ATOMIC }
Device::memory_access_properties_t::sharedCrossDeviceAllocCapabilities : Device::{ 0 }
Device::memory_access_properties_t::sharedSystemAllocCapabilities : Device::{ 0 }

Device::cache_properties_t::stype : structure_type_t::DEVICE_CACHE_PROPERTIES
Device::cache_properties_t::pNext : 0x0
Device::cache_properties_t::flags : Device::{ 0 }
Device::cache_properties_t::cacheSize : 0

Device::image_properties_t::stype : structure_type_t::DEVICE_IMAGE_PROPERTIES
Device::image_properties_t::pNext : 0x0
Device::image_properties_t::maxImageDims1D : 16384
Device::image_properties_t::maxImageDims2D : 16384
Device::image_properties_t::maxImageDims3D : 2048
Device::image_properties_t::maxImageBufferSize : 268434944
Device::image_properties_t::maxImageArraySlices : 2048
Device::image_properties_t::maxSamplers : 16
Device::image_properties_t::maxReadImageArgs : 128
Device::image_properties_t::maxWriteImageArgs : 128

Abort was called at 102 line in file:
/builddir/build/BUILD/compute-runtime-20.41.18123/level_zero/core/source/cmdlist/cmdlist_imp.cpp
Aborted (core dumped)
However, If I build level-zero using this repo:
https://github.com/oneapi-src/level-zero/tree/master/samples/zello_world

I don't get any abort.

This is the backtrace I get by using gdb:

$ gdb ./zello_world -c core.file
 
Core was generated by `./zello_world'.
Program terminated with signal 6, Aborted.
#0  0x00007fe64c02b387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fe64c02b387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007fe64c02ca78 in __GI_abort () at abort.c:90
#2  0x00007fe64b9ca439 in NEO::abortExecution () at /usr/src/debug/compute-runtime-20.41.18123/shared/source/helpers/abort.cpp:14
#3  0x00007fe64b9ca491 in NEO::abortUnrecoverable (line=line@entry=102, file=file@entry=0x7fe64bc3be88 "/builddir/build/BUILD/compute-runtime-20.41.18123/level_zero/core/source/cmdlist/cmdlist_imp.cpp")
    at /usr/src/debug/compute-runtime-20.41.18123/shared/source/helpers/debug_helpers.cpp:24
#4  0x00007fe64b9f2b03 in L0::CommandList::createImmediate (productFamily=productFamily@entry=19, device=device@entry=0x10ebff0, desc=desc@entry=0x7ffd0b390070, internalUsage=internalUsage@entry=false, 
    engineGroupType=1040, returnValue=returnValue@entry=@0x7ffd0b38ff54: ZE_RESULT_SUCCESS) at /usr/src/debug/compute-runtime-20.41.18123/level_zero/core/source/cmdlist/cmdlist_imp.cpp:102
#5  0x00007fe64b9f3b1b in L0::DeviceImp::createCommandListImmediate (this=0x10ebff0, desc=0x7ffd0b390070, phCommandList=0x7ffd0b38fff0)
    at /usr/src/debug/compute-runtime-20.41.18123/level_zero/core/source/device/device_imp.cpp:100
#6  0x00007fe64ced956a in zeCommandListCreateImmediate () from /home/juan/manchester/SPIRV/intel-llvm/llvm/build/tools/sycl/plugins/level_zero/level_zero/level_zero_loader/build/lib/libze_loader.so.1
#7  0x00007fe64cf3b23f in ze::CommandList::CreateImmediate(ze::Context*, ze::Device*, ze::CommandQueue::desc_t const*) ()
   from /home/juan/manchester/SPIRV/intel-llvm/llvm/build/tools/sycl/plugins/level_zero/level_zero/level_zero_loader/build/lib/libze_loader.so.1
#8  0x0000000000407784 in main ()

Hope this helps.

@jandres742
Copy link

@jjfumero
I'm assuming the error is when you execute the zello_world example from here, https://github.com/intel/compute-runtime/blob/master/level_zero/core/test/black_box_tests/zello_world_gpu.cpp. So, that one is a different sample than from https://github.com/oneapi-src/level-zero/tree/master/samples/zello_world, since the latter doesn't actually execute anything, if I'm correct.

So can you confirm:

Which sample are you using?
How are you compiling it?
Do you have installed the level-zero loader, level-zero-gpu library, and dependencies? See here for details, https://github.com/intel/compute-runtime/releases

@jjfumero
Copy link
Author

I am running a different zello_world. The one provided by intel/llvm.

Version I am using

This example it appears in the build process:

$  pwd
/home/juan/intel-llvm/llvm/build/tools/sycl/plugins/level_zero/level_zero/level_zero_loader
$ git remote -v
origin	https://github.com/oneapi-src/level-zero.git (fetch)
origin	https://github.com/oneapi-src/level-zero.git (push)
$ git lg  #last commit 
* fcc7b7a - (HEAD, tag: v1.0) Update file listed in README (4 months ago) <Yates, Brandon>

This is the version used in intel/llvm

https://github.com/oneapi-src/level-zero/blob/fcc7b7aceacf3cbfabaf3c0952ae0cc02d083592/samples/zello_world/zello_world.cpp

So it looks to me that the reference from intel/llvm to level-zero is not updated.

How am I compiling?

I took the instructions from level-zero

mkdir build
cd build
cmake ..
cmake --build . --config Release
cmake --build . --config Release --target package

Compute-runtime version

I am using the NEO driver build for CentOS 7.8. I am currently running 20.41.18123 following these instructions:
https://github.com/intel/compute-runtime/blob/master/level_zero/doc/DISTRIBUTIONS.md#centos-7-8-red-hat-enterprise-linux-7

https://github.com/intel/compute-runtime/blob/master/opencl/doc/DISTRIBUTIONS.md#centos-7-8-red-hat-enterprise-linux-7

@jandres742
Copy link

@jjfumero That test looks to be not updated. I will take care of that. For instance, to create lists and queues, you need to use queue groups, as mentioned here https://spec.oneapi.com/level-zero/latest/core/PROG.html#command-queue-groups. Also, it's using C++ wrappers, which have been also deprecated.

Could you please try the one on this repo, https://github.com/intel/compute-runtime/blob/master/level_zero/core/test/black_box_tests/zello_world_gpu.cpp, and see if it executes correctly for you?

@jjfumero
Copy link
Author

jjfumero commented Nov 13, 2020

Yes, it passes:

$ ./zello_world_gpu 
Device : 
 * name : Intel(R) Gen9
 * type : GPU
 * vendorId : 8086

Zello World Results validation PASSED

BTW, wouldn't it be good to add the source copy_buffer_to_buffer.cl in the repository as well?

@jandres742
Copy link

Great! Glad to see it works.

It's already there, https://github.com/intel/compute-runtime/blob/master/shared/source/built_ins/kernels/copy_buffer_to_buffer.builtin_kernel

@jandres742
Copy link

I dont think there's nothing else to do here. @jjfumero feel free to reopen if you deem necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants