Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Scala GPU Build examples CI failure #15605

Open
ChaiBapchya opened this issue Jul 19, 2019 · 10 comments
Open

Scala GPU Build examples CI failure #15605

ChaiBapchya opened this issue Jul 19, 2019 · 10 comments

Comments

@ChaiBapchya
Copy link
Contributor

Scala unix GPU build error in an unrelated PR #15541

[INFO] ------------------------------------------------------------------------
[INFO] Building MXNet Scala Package - Examples INTERNAL
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] MXNet Scala Package - Parent ....................... SUCCESS [ 30.879 s]
[INFO] MXNet Scala Package - Initializer .................. SUCCESS [  9.859 s]
[INFO] MXNet Scala Package - Initializer Native ........... SUCCESS [  1.473 s]
[INFO] MXNet Scala Package - Macros ....................... SUCCESS [ 13.290 s]
[INFO] MXNet Scala Package - Native ....................... SUCCESS [  4.672 s]
[INFO] MXNet Scala Package - Core ......................... SUCCESS [01:54 min]
[INFO] MXNet Scala Package - Inference .................... SUCCESS [ 16.025 s]
[INFO] MXNet Scala Package - Examples ..................... FAILURE [  2.620 s]
[INFO] MXNet Scala Package - Spark ML ..................... SKIPPED
[INFO] Assembly Scala Package ............................. SKIPPED
[INFO] MXNet Scala Package - Full linux-x86_64-only ....... SKIPPED
[INFO] MXNet Scala Package - Full linux-x86_64-only ....... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:16 min
[INFO] Finished at: 2019-07-18T05:52:51+00:00
[INFO] Final Memory: 47M/3263M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project mxnet-examples: Could not resolve dependencies for project org.apache.mxnet:mxnet-examples:jar:INTERNAL: Could not transfer artifact nu.pattern:opencv:jar:2.4.9-7 from/to central (https://repo.maven.apache.org/maven2): GET request of: nu/pattern/opencv/2.4.9-7/opencv-2.4.9-7.jar from central failed: Connection reset -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 

Pipeline - http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-15541/5/pipeline/

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Scala, Test, CI, Build

@ChaiBapchya
Copy link
Contributor Author

@mxnet-label-bot add [Scala, Test, CI, Build]

@zachgk
Copy link
Contributor

zachgk commented Jul 22, 2019

@ChaiBapchya This issue is that the network connection failed while downloading a maven dependency. I don't know how we handle this besides rerunning the jenkins job

@ChaiBapchya
Copy link
Contributor Author

Haven't looked at the code. But isn't there any way to catch exception and troubleshoot, instead of asking contributors to retriever CI?

@zachgk
Copy link
Contributor

zachgk commented Jul 23, 2019

The problem doesn't lie on Maven so I don't think there is anything Maven can do to address it directly. Usually we would add some amount of retries for downloading. I just looked and there doesn't seem to be a way to make Maven retry downloading. I would be hesitant to retry the entire Maven test because this doesn't seem to be a frequent problem and it would immediately multiply the cost of the Scala tests.

We could try to do something to improve the error message. Currently, it is:

Failed to execute goal on project mxnet-examples: Could not resolve dependencies for project org.apache.mxnet:mxnet-examples:jar:INTERNAL: Could not transfer artifact nu.pattern:opencv:jar:2.4.9-7 from/to central (https://repo.maven.apache.org/maven2): GET request of: nu/pattern/opencv/2.4.9-7/opencv-2.4.9-7.jar from central failed: Connection reset -> [Help 1]

I feel like this error message is clear enough that it was some kind of networking problem. Are you thinking some kind of CI specific error messaging?

@ChaiBapchya
Copy link
Contributor Author

@zachgk I just feel it is unreasonable to expect contributors/MXNet users to retrigger the PRs because something wasn't downloaded correctly.

Now specific to this issue -
Error says GET request failed.
Would be great if it adds way to solve it (currently retrigger CI, hopefully in future it auto-corrects itself)

But going forward, we need to make CI robust enough for connection failures.

@zachgk
Copy link
Contributor

zachgk commented Aug 14, 2019

@perdasilva Any idea why the CI might be having problems connecting to maven?

@ChaiBapchya
Copy link
Contributor Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants