Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT Failures after updating to latest IGC #725

Closed
pvelesko opened this issue Dec 9, 2023 · 5 comments
Closed

JIT Failures after updating to latest IGC #725

pvelesko opened this issue Dec 9, 2023 · 5 comments
Milestone

Comments

@pvelesko
Copy link
Collaborator

pvelesko commented Dec 9, 2023

I use the following script for building and installing the latest intel-compute-runtime:
https://github.com/pvelesko/intel-compute-runtime-build

After installing it:

╰─$ ./samples/0_MatrixMultiply/MatrixMultiply
Device name Intel(R) Arc(TM) A380 Graphics
MatrixMultiply: ./lib/SPIRV/SPIRVToLLVMDbgTran.cpp:1076: llvm::DIFile* SPIRV::SPIRVToLLVMDbgTran::getFile(SPIRV::SPIRVId): Assertion `SourceArgs.size() == OperandCount && "Invalid number of operands"' failed.
CHIP error [TID 7664] [1702111633.094515022] : Program BUILD LOG for device #0:Intel(R) Arc(TM) A380 Graphics:
IGC: Internal Compiler Error: Abnormal termination

CHIP error [TID 7664] [1702111633.094554951] : hipErrorNotInitialized (CL_BUILD_PROGRAM_FAILURE ) in /home/pvelesko/space/chipStar/main/src/backend/OpenCL/CHIPBackendOpenCL.cc:739:compile

CHIP error [TID 7664] [1702111633.094599766] : Caught Error: hipErrorNotInitialized
HIP API error

Going back to using old IGC resolves the issue

@pvelesko pvelesko added this to the 1.1 milestone Dec 9, 2023
@linehill
Copy link
Collaborator

... Assertion SourceArgs.size() == OperandCount && "Invalid number of operands"' failed.

Looks like a llvm-spirv bug that manifests in some of its branches. The buggy assertion can be found in for example in the LLVM-16 branch which it expects all OpExtInst ... DebugSource ... instructions to have two operands (OperandCount). This does not seem right respect to the debug info spec which states the DebugSource instructions take one operand at minimum (counted after the OpExtInst’s instruction operand). The assertion and the operand count seems to be corrected in other branches like llvm_release_150 and llvm_release_170.

It could be that the assertion gets triggered because the SPIR-V is generated with a llvm-spirv version that produces OpExtInst … DebugSource instructions with single operand. At least the latest llvm-spirv from llvm_release_170 branch does this.

@pvelesko
Copy link
Collaborator Author

I built using LLVM-14 provided by apt. According to intel-compute-runtime build instructions, LLVM-14 is the supported version and LLVV-SPIRV-Translator should match versions. Perhaps the revision from apt is just a bit behind. So overall, not related to igc

@linehill
Copy link
Collaborator

Found out by chance that the -gdwarf-4 option has an effect on the debug info generated for device. This option is set on in debug builds of the chipStar and it appears in the compilation of HIP samples.

On LLVM-17, -gdwarf-4 generates OptExtInst … DebugSources instructions with a single operand which triggers the buggy assertion. On the other hand, -g generates OptExtInst … DebugSources instructions with two operands.

So we might dodge the assertion by avoiding using the -gdwarf-# option in the chipStar.

@pvelesko
Copy link
Collaborator Author

This option was originally introduced because without it there was some issue using gdb. Perhaps it's no longer necessary

@pvelesko
Copy link
Collaborator Author

Ran into another issue in IGC intel/intel-graphics-compiler#310

but overall, this is resolved for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants