Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[CI][v1.6.x] Fix failing CI pipelines #18597

Merged

Conversation

ChaiBapchya
Copy link
Contributor

@ChaiBapchya ChaiBapchya commented Jun 20, 2020

Previous PR #18560 cherry-picked the PR #18339 which fixed edge pipeline in 1.7.x
However, the cherry-pick didn't fix edge pipeline & ended up breaking unix-gpu pipeline for 1.6.x

This got merged because it triggered only sanity pipeline [& not other pipelines].

This PR thus reverts erroneous additions of the previous commit that got merged. Additionally, this PR fixes the edge pipeline. So that 1.6.x branch becomes all GREEN.

@mxnet-bot
Copy link

Hey @ChaiBapchya , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-cpu, sanity, website, centos-gpu, unix-gpu, miscellaneous, centos-cpu, edge, windows-cpu, windows-gpu, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

ChaiBapchya and others added 4 commits June 21, 2020 01:52
…st in runtime_functions.sh"

This reverts commit de173b0.
* Fix quantized concat when inputs are mixed int8 and uint8

Change-Id: I4da04bf4502425134a466823fb5f73da2d7a419b

* skip flaky test

* trigger ci
@ChaiBapchya ChaiBapchya requested a review from szha as a code owner June 21, 2020 08:54
@ChaiBapchya
Copy link
Contributor Author

ChaiBapchya commented Jun 21, 2020

@josephevans @leezu
This PR fails for sanity pipeline.
[FAIL] Build 2: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/job/sanity/job/PR-18597/2/display/redirect
Specifically, line

make -f R-package/Makefile rcpplint

in the function sanity_check
It is not present in ci/docker/runtime_funtions.sh in this PR branch. But still it picks that up. Any idea why?

@ChaiBapchya
Copy link
Contributor Author

ChaiBapchya commented Jun 21, 2020

Surprisingly, after triggering empty commit sanity passed.
Build 3: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/job/sanity/job/PR-18597/3/display/redirect

@ChaiBapchya ChaiBapchya changed the title [CI][v1.6.x] Fix unix-gpu pipeline [CI][v1.6.x] Fix failing CI pipelines Jun 21, 2020
@ChaiBapchya
Copy link
Contributor Author

@mxnet-bot run ci [website]
Failure: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwebsite/detail/PR-18597/2/pipeline
Weird github submodule update issue [likely network issue]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [website]

@ChaiBapchya
Copy link
Contributor Author

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/edge/branches/PR-18597/runs/2/nodes/43/log/?start=0
Build / NVidia Jetson / ARMv8

Makefile:559: recipe for target 'build/src/operator/numpy/linalg/np_gesvd_gpu.o' failed
Makefile:559: recipe for target 'build/src/operator/numpy/random/np_multinomial_op_gpu.o' failed
make: *** [build/src/operator/numpy/random/np_uniform_op_gpu.o] Error 1

@waytrue17 @leezu @mseth10 any idea about this jetson build?

waytrue17 and others added 6 commits June 29, 2020 17:18
* update dockerfile for jetson

* add toolchain files

* update build_jetson function

* update ubuntu_julia.sh

* update FindCUDAToolkit.cmake

* Update centos7_python.sh

* revert changes on ubuntu_julia.sh

* disable TVM for gpu build

* Disable TVM_OP on GPU builds

Co-authored-by: Wei Chu <weichu@amazon.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
@ChaiBapchya
Copy link
Contributor Author

For edge pipeline, Jetson build, it fails while building binary distribution wheel
Error

ImportError: No module named setuptools

However, in the docker container, in one of the previous steps, we do install setuptools, as confirmed

Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from -r /work/requirements (line 35))

For ref:
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/edge/branches/PR-18597/runs/5/nodes/43/steps/154/log/?start=0

@leezu
Copy link
Contributor

leezu commented Jun 30, 2020

You could investigate where bdist is called to debug the error. Actually you could also delete the bdist step as the built wheel is discarded anyways and we don't need to test copying libmxnet.so into a zip file in the Jetson pipeline. (I don't think it's done in master)

@ChaiBapchya
Copy link
Contributor Author

Found the root-cause of this issue

Function Dockerfile Base Docker Image Python
build_jetson ci/docker/Dockerfile.build.jetson FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 ✖︎
build_armv6 ci/docker/Dockerfile.build.armv6 FROM dockcross/linux-armv6 2.7
build_armv7 ci/docker/Dockerfile.build.armv7 FROM dockcross/linux-armv7 2.7
build_armv8 ci/docker/Dockerfile.build.armv8 FROM dockcross/linux-armv64 2.7

All dockcross/linux-arm* have default python as python2 and it includes setuptools installed.
However, nvidia/cuda docker image doesn't have python & as a result doesn't have setuptools. Hence we need to specifically install python2 & setuptools.

@ChaiBapchya
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@ChaiBapchya
Copy link
Contributor Author

@leezu @sandeep-krishnamurthy @PatricZhao @szha
Please review.
This unblocks #18632

@ChaiBapchya
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Jun 30, 2020
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing 1.6 pipeline.

@sandeep-krishnamurthy
Copy link
Contributor

Thank you @ChaiBapchya

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit fb3fea4 into apache:v1.6.x Jul 1, 2020
@ciyongch
Copy link
Contributor

ciyongch commented Jul 1, 2020

Thanks you @ChaiBapchya for the prompt fix, I will rebase my PR #18632.

@ChaiBapchya ChaiBapchya deleted the fix_unix_gpu_pipeline branch July 1, 2020 01:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants