How to reproduce a crash in Windows pipeline: an example #8

hcho3 · 2019-12-02T03:14:59Z

Consider this example: https://xgboost-ci.net/blue/organizations/jenkins/xgboost-win64/detail/PR-5078/4/pipeline/73

Let's try to reproduce the issue. The very first step is to find which machine image (AMI) was used. In this case, it's test-win64-gpu-cuda10.0, which corresponds to the image Windows2016_GPUTest_CUDA10_Dec2019.

Log into the EC2 console and launch a new EC2 instance using the image Windows2016_GPUTest_CUDA10_Dec2019. Use g4dn.xlarge type. Use password you set in How to build machine image (AMI) to run test pipeline on Windows #7.
Locate the artifact that's causing the issue. In this example, it's testxgboost.exe.
Go to the S3 console and navigate to the S3 bucket xgboost-ci-jenkins-artifacts. This works because in How to set up a Jenkins master node from scratch. #6 we configured Jenkins to store all artifacts in S3. The prefix for the artifact in this example is xgboost-win64/PR-5078/4/stashes. (In general, the prefix is of form <pipeline name>/<Pull Request ID>/<Build ID>/stashes.) Now we can download xgboost_cpp_tests.tgz, which contains testxgboost.exe.
Copy over xgboost_cpp_tests.tgz to the EC2 instance.
Install 7-zip to extract testxgboost.exe from the tgz file.
Run testxgboost.exe

The text was updated successfully, but these errors were encountered:

hcho3 · 2019-12-02T05:25:01Z

@trivialfis FYI, this is a blocking issue

hcho3 · 2019-12-02T08:23:27Z

~~I resolved this particular problem by installing latest driver from http://www.nvidia.com/drivers.~~ It seems like using the latest driver (412.36) is causing an issue. I will try version 411.82 (first to support Tesla T4) instead.

hcho3 · 2019-12-02T09:18:02Z

I have to conclude that CUDA 10.0 can't really work with Tesla T4 (G4 instance), at least on Windows. I'll just use P2 type instead.

hcho3 mentioned this issue Dec 2, 2019

[CI] Jenkins is down dmlc/xgboost#5061

Closed

hcho3 pinned this issue Dec 23, 2019

hcho3 unpinned this issue Dec 23, 2019

hcho3 closed this as completed Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce a crash in Windows pipeline: an example #8

How to reproduce a crash in Windows pipeline: an example #8

hcho3 commented Dec 2, 2019 •

edited

Loading

hcho3 commented Dec 2, 2019

hcho3 commented Dec 2, 2019 •

edited

Loading

hcho3 commented Dec 2, 2019 •

edited

Loading

How to reproduce a crash in Windows pipeline: an example #8

How to reproduce a crash in Windows pipeline: an example #8

Comments

hcho3 commented Dec 2, 2019 • edited Loading

hcho3 commented Dec 2, 2019

hcho3 commented Dec 2, 2019 • edited Loading

hcho3 commented Dec 2, 2019 • edited Loading

hcho3 commented Dec 2, 2019 •

edited

Loading

hcho3 commented Dec 2, 2019 •

edited

Loading

hcho3 commented Dec 2, 2019 •

edited

Loading