Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] XGBoost-Spark reported "ERROR RabitTracker ......" #3834

Closed
FredYao opened this issue Oct 26, 2018 · 10 comments
Closed

[jvm-packages] XGBoost-Spark reported "ERROR RabitTracker ......" #3834

FredYao opened this issue Oct 26, 2018 · 10 comments

Comments

@FredYao
Copy link

FredYao commented Oct 26, 2018

I was running spark using local mode, following the example posted on the official tutorial (XGBoost4J-Spark Tutorial). While I called val xgbClassificationModel = xgbClassifier.fit(xgbInput), the following error occurred:

scala> val xgbClassificationModel = xgbClassifier.fit(xgbInput)
Tracker started, with env={}
2018-10-26 14:08:36 ERROR RabitTracker:91 - Uncaught exception thrown by worker:
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:633)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:929)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:927)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainDistributed$4$$anon$1.run(XGBoost.scala:233)
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[0] train-merror:0.026667
[0] train-merror:0.026667

.......
.......
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[14:08:36] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[99] train-merror:0.000000
ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.ml$dmlc$xgboost4j$scala$spark$XGBoost$$postTrackerReturnProcessing(XGBoost.scala:283)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainDistributed$4.apply(XGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainDistributed$4.apply(XGBoost.scala:222)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:221)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:191)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:48)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
... 49 elided

Did anyone have a clue about this issue?
Thanks in advance.

@hcho3 hcho3 changed the title XGBoost-Spark reported "ERROR RabitTracker ......" [jvm-packages] XGBoost-Spark reported "ERROR RabitTracker ......" Oct 26, 2018
@CodingCat
Copy link
Member

CodingCat commented Oct 27, 2018

Tracker started, with env={} means your tracker cannot bind to your local ip address, can you check if your hosts file contains 127.0.0.1 or your computer name?

@FredYao
Copy link
Author

FredYao commented Oct 27, 2018

@CodingCat , I am running on Yarn cluster. Everything about the cluster is set by default. I did not change them explicitly.

@CodingCat
Copy link
Member

Yes, you need to contact your cluster admin say whether hosts files are correctly written

@FredYao
Copy link
Author

FredYao commented Oct 28, 2018

Thank you, @CodingCat, is there other way to bind the ip address to tracker by manual setting?

@CodingCat
Copy link
Member

see the discussion in #3833

you can add xgboost-tracker.properties in your resources directory

@FredYao
Copy link
Author

FredYao commented Oct 29, 2018

@CodingCat , I have checked the hosts file, its content:
127.0.0.1 localhost.localdomain localhost
TrackerProperties can read this correctly?

@henokyen
Copy link

has anyone got a solution for this, i am struggling with the same error

@CodingCat
Copy link
Member

@FredYao @henokyen , add xgboost-tracker.properties with the content host-ip=0.0.0.0 in the resources folder of your jar, and

@FredYao
Copy link
Author

FredYao commented Nov 18, 2018

@henokyen , the first thing is to check /etc/hosts of the job driver/worker, make sure that it has the 127.0.0.1 for localhost.

If the localhost is correct, then check the python version that you used. If it's lower that Python 2.7, the error is then expected. Because the rabbit tracker is supposed to work for Python 2.7+. If this is the case, you can change the tracker to the scala-version while setting the parameters for XGBoostClassifier.
val xgbParam = Map("eta" -> 0.1f, "max_depth" -> 2, ..., "tracker_conf" -> "scala" )

@henokyen
Copy link

this is the content of my /etc/host

Host Database

localhost is used to configure the loopback interface

when the system is booting. Do not change this entry.

127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost

i have 127.0.0.1 for localhost

I am on MacBook Pro. I am trying to implement this tutorial
https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb
but I kept getting this error
Tracker started, with env={}
[11:21:07] /Users/nanzhu/code/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 70 extra nodes, 0 pruned nodes, max_depth=6
[0] train-error:0.112202
[Stage 10:> (0 + 1) / 1]2018-11-19 11:21:08 ERROR RabitTracker:91 - Uncaught exception thrown by worker:

I am using python version 2.7.10. Any idea? Thanks

@lock lock bot locked as resolved and limited conversation to collaborators Feb 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants