Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BuildBot] Uplift GPU RT version for Linux CI Process #2807

Merged

Conversation

bb-sycl
Copy link
Contributor

@bb-sycl bb-sycl commented Nov 21, 2020

Uplift GPU RT version for Linux to 20.46.18421

Signed-off-by: bb-sycl <bb-sycl@intel.com>
@yanfeng3721 yanfeng3721 marked this pull request as ready for review November 24, 2020 06:13
@yanfeng3721
Copy link
Contributor

/summary:run

@bader
Copy link
Contributor

bader commented Nov 24, 2020

@yanfeng3721, do you know the reason of buildbot/sycl-ubu-x64-pr failure?

@yanfeng3721
Copy link
Contributor

@yanfeng3721, do you know the reason of buildbot/sycl-ubu-x64-pr failure?

It looks like test driver issue, the

llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/llvm/config.py:347: note: using clang: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/clang
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:79: note: Backend (SYCL_BE): PI_LEVEL_ZERO

^Cllvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/TestingConfig.py:101: fatal: unable to parse config file '/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py', traceback: Traceback (most recent call last):
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/../../llvm.src/llvm/utils/lit/lit/TestingConfig.py", line 88, in load_from_path
exec(compile(data, path, 'exec'), cfg_globals, None)
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 131, in
if getDeviceCount("cpu")[0]:
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 89, in getDeviceCount
(output, err) = process.communicate()
File "/usr/local/lib/python3.7/subprocess.py", line 926, in communicate
stdout = self.stdout.read()
KeyboardInterrupt

I noticed the related test driver is already changed by #2794 , I will trigger a test with new test driver for a try.

Refine comments in dependency.conf
@bader
Copy link
Contributor

bader commented Nov 24, 2020

@yanfeng3721, do you know the reason of buildbot/sycl-ubu-x64-pr failure?

It looks like test driver issue, the

llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/llvm/config.py:347: note: using clang: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/clang
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:79: note: Backend (SYCL_BE): PI_LEVEL_ZERO

^Cllvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/TestingConfig.py:101: fatal: unable to parse config file '/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py', traceback: Traceback (most recent call last):
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/../../llvm.src/llvm/utils/lit/lit/TestingConfig.py", line 88, in load_from_path
exec(compile(data, path, 'exec'), cfg_globals, None)
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 131, in
if getDeviceCount("cpu")[0]:
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 89, in getDeviceCount
(output, err) = process.communicate()
File "/usr/local/lib/python3.7/subprocess.py", line 926, in communicate
stdout = self.stdout.read()
KeyboardInterrupt

I noticed the related test driver is already changed by #2794 , I will trigger a test with new test driver for a try.

I can't find such output in the logs. It looks like LIT tests hang on testing Level Zero back-end.

http://ci.llvm.intel.com:8010/#/builders/2/builds/6305/steps/16/logs/stdio

[54/56] Running the SYCL regression tests for Level Zero
[2020-11-24 10:02:18,593] lit INFO: LIT test no output within 1200 seconds, start back-trace
[2020-11-24 10:02:18,594] lit INFO: ============================= Start back-trace =============================
sys_bbs+ 15624  0.0  0.0 108824 20708 ?        S    09:42   0:00 /localdisk2/sycl_ci/buildbot/sandbox/bin/python3 /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/./bin/llvm-lit -v -sv --param SYCL_BE=PI_LEVEL_ZERO /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
sys_bbs+ 15734  0.0  0.0  34132  3088 ?        S    10:02   0:00 /bin/sh -c ps aux | grep /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
sys_bbs+ 15736  0.0  0.0  35808  2516 ?        R    10:02   0:00 grep /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
[2020-11-24 10:02:18,601] lit WARNING: The PID of hang binary not found
[2020-11-24 10:02:18,601] lit INFO: ============================== End back-trace ==============================
[2020-11-24 10:02:18,601] lit INFO: === stage lit end ===
command timed out: 1200 seconds without output running [b'python3', b'llvm_ci/intel/worker/tools/build.py', b'-n', b'6305', b'-b', b'pull/2807/head', b'-r', b'2807', b'-t', b'check-sycl', b'-p', b'sycl', b'-s', b'lit', b'-P', b'intel/llvm', b'-m', b'sycl-ubu-x64-pr', b'-e', b'c6fa50e6f463f39f5ddb5d1c7511997437737262', b'-U', b'http://ci.llvm.intel.com:8010/#/builders/2/builds/6305'], attempting to kill
program finished with exit code 1
elapsedTime=2400.161946

@yanfeng3721
Copy link
Contributor

@yanfeng3721, do you know the reason of buildbot/sycl-ubu-x64-pr failure?

It looks like test driver issue, the
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/llvm/config.py:347: note: using clang: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/clang
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:79: note: Backend (SYCL_BE): PI_LEVEL_ZERO
^Cllvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/llvm/utils/lit/lit/TestingConfig.py:101: fatal: unable to parse config file '/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py', traceback: Traceback (most recent call last):
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/../../llvm.src/llvm/utils/lit/lit/TestingConfig.py", line 88, in load_from_path
exec(compile(data, path, 'exec'), cfg_globals, None)
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 131, in
if getDeviceCount("cpu")[0]:
File "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py", line 89, in getDeviceCount
(output, err) = process.communicate()
File "/usr/local/lib/python3.7/subprocess.py", line 926, in communicate
stdout = self.stdout.read()
KeyboardInterrupt
I noticed the related test driver is already changed by #2794 , I will trigger a test with new test driver for a try.

I can't find such output in the logs. It looks like LIT tests hang on testing Level Zero back-end.

http://ci.llvm.intel.com:8010/#/builders/2/builds/6305/steps/16/logs/stdio

[54/56] Running the SYCL regression tests for Level Zero
[2020-11-24 10:02:18,593] lit INFO: LIT test no output within 1200 seconds, start back-trace
[2020-11-24 10:02:18,594] lit INFO: ============================= Start back-trace =============================
sys_bbs+ 15624  0.0  0.0 108824 20708 ?        S    09:42   0:00 /localdisk2/sycl_ci/buildbot/sandbox/bin/python3 /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/./bin/llvm-lit -v -sv --param SYCL_BE=PI_LEVEL_ZERO /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
sys_bbs+ 15734  0.0  0.0  34132  3088 ?        S    10:02   0:00 /bin/sh -c ps aux | grep /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
sys_bbs+ 15736  0.0  0.0  35808  2516 ?        R    10:02   0:00 grep /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/tools/sycl/test
[2020-11-24 10:02:18,601] lit WARNING: The PID of hang binary not found
[2020-11-24 10:02:18,601] lit INFO: ============================== End back-trace ==============================
[2020-11-24 10:02:18,601] lit INFO: === stage lit end ===
command timed out: 1200 seconds without output running [b'python3', b'llvm_ci/intel/worker/tools/build.py', b'-n', b'6305', b'-b', b'pull/2807/head', b'-r', b'2807', b'-t', b'check-sycl', b'-p', b'sycl', b'-s', b'lit', b'-P', b'intel/llvm', b'-m', b'sycl-ubu-x64-pr', b'-e', b'c6fa50e6f463f39f5ddb5d1c7511997437737262', b'-U', b'http://ci.llvm.intel.com:8010/#/builders/2/builds/6305'], attempting to kill
program finished with exit code 1
elapsedTime=2400.161946

It looks like the get_device_count_by_type will hang with uplifted GPURT with the following paramaters.

llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:91: note: get_device_count_by_type_path: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/./bin/**get_device_count_by_type**
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:92: note: device_type: **cpu**
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:93: note: backend: **PI_LEVEL_ZERO**
llvm-lit: /localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/lit.cfg.py:94: note: Entry 12((output, err) = process.communicate())

@bader
Copy link
Contributor

bader commented Nov 24, 2020

Please, report this issue to Level Zero driver team.

@bader
Copy link
Contributor

bader commented Nov 28, 2020

To unblock this PR, we can skip Level Zero API calls for cpu device type in get_device_count_by_type.
FYI, @romanovvlad, @vladimirlaz.

@yanfeng3721
Copy link
Contributor

To unblock this PR, we can skip Level Zero API calls for cpu device type in get_device_count_by_type.
FYI, @romanovvlad, @vladimirlaz.

The issue also blocks PR #2832

@bader
Copy link
Contributor

bader commented Dec 1, 2020

Replaced by #2832.

@yanfeng3721
Copy link
Contributor

Hi @bader , I would like to reopen the PR since #2832 has regression detected.
Hi @dm-vodopyanov, could you please also add a3538c5 to this PR?

@bader
Copy link
Contributor

bader commented Dec 3, 2020

Hi @bader , I would like to reopen the PR since #2832 has regression detected.
Hi @dm-vodopyanov, could you please also add a3538c5 to this PR?

@yanfeng3721, I suggest you cherry-pick a3538c5 and re-run Jenkins tests after that.

This patch updates Level Zero loader to 1.0.16 to support latest
versions of Level Zero runtime.
@yanfeng3721
Copy link
Contributor

/summary:run

Copy link
Contributor

@yanfeng3721 yanfeng3721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux post commit test pass.

@bader bader merged commit ccd19d5 into intel:sycl Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants