Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests #4966

Merged
merged 67 commits into from
Nov 1, 2019
Merged
Show file tree
Hide file tree
Changes from 58 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
9cd3db5
[phase 1] expose sets of rabit configurations to spark layer
Sep 19, 2019
1223b0e
add back mutable import
Sep 19, 2019
65e95a9
disable ring_mincount till https://github.com/dmlc/rabit/pull/106d
Sep 20, 2019
42cbe59
Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull…
chenqin Sep 22, 2019
88e07ac
apply latest rabit
chenqin Sep 22, 2019
127eb79
fix build error
chenqin Sep 22, 2019
1971858
apply https://github.com/dmlc/xgboost/pull/4880
chenqin Sep 23, 2019
960b0ec
downgrade cmake in rabit
chenqin Sep 23, 2019
288e25b
point to rabit with DMLC_ROOT fix
Sep 23, 2019
711eb92
relative path of rabit install prefix
Sep 23, 2019
76fa5bd
split rabit parameters to another trait
chenqin Sep 24, 2019
4156a0e
Merge branch 'master' into master
chenqin Sep 24, 2019
57c1496
misc
chenqin Sep 24, 2019
62a5079
Merge branch 'master' of github.com:chenqin/xgboost
chenqin Sep 24, 2019
0abbece
misc
chenqin Sep 24, 2019
b22eb78
Delete .classpath
chenqin Sep 24, 2019
29b6b50
Delete .classpath
chenqin Sep 24, 2019
c880d8d
Delete .classpath
chenqin Sep 24, 2019
5c24435
Update XGBoostClassifier.scala
chenqin Sep 24, 2019
607eb98
Update XGBoostRegressor.scala
chenqin Sep 24, 2019
65ebfc3
Update GeneralParams.scala
chenqin Sep 24, 2019
c503ae9
Update GeneralParams.scala
chenqin Sep 24, 2019
2d19dc4
Update GeneralParams.scala
chenqin Sep 24, 2019
256d262
Update GeneralParams.scala
chenqin Sep 24, 2019
0e765b6
Delete .classpath
chenqin Sep 24, 2019
ac8db45
Update RabitParams.scala
chenqin Sep 24, 2019
c2bda30
Update .gitignore
chenqin Sep 24, 2019
d2720ab
Update .gitignore
chenqin Sep 24, 2019
43baad3
apply rabitParams to training
Sep 24, 2019
2b05482
use string as rabit parameter value type
Sep 24, 2019
f40c4e0
cleanup
Sep 24, 2019
6bea5dc
add rabitEnv check
Sep 26, 2019
e0d86de
point to dmlc/rabit
chenqin Sep 27, 2019
9674e50
per feedback
Sep 30, 2019
0bd354b
update private scope
chenqin Oct 1, 2019
abef46a
misc
Oct 1, 2019
97120f1
update rabit
Oct 8, 2019
5b9d75f
add rabit_timtout, fix failing test.
chenqin Oct 13, 2019
4d46f23
split tests
chenqin Oct 13, 2019
046e0e4
allow build jvm with rabit mock
chenqin Oct 14, 2019
cfb9ced
pass mock failures to rabit with test
Oct 14, 2019
3e68f30
add mock error and graceful handle rabit assertion error test
chenqin Oct 15, 2019
0317b53
split mvn test
Oct 15, 2019
c753fd9
remove sign for test
Oct 15, 2019
fee1131
update rabit
Oct 16, 2019
55a4552
build jvm_packages with rabit mock
Oct 16, 2019
e9af489
point back to dmlc/rabit
Oct 16, 2019
a112245
per feedback, update scala header
Oct 16, 2019
dbfb0d9
cleanup pom
chenqin Oct 17, 2019
fd28758
per feedback
Oct 18, 2019
606f548
try fix lint
Oct 18, 2019
caf87d3
Merge branch 'master' into master
Oct 18, 2019
a6816c4
fix lint
chenqin Oct 19, 2019
3e8a05e
per feedback, remove bootstrap_cache
chenqin Oct 20, 2019
43bac8b
per feedback 2
chenqin Oct 20, 2019
814f2e1
try replace dev profile with passing mvn property
chenqin Oct 20, 2019
0b9a452
fix build error
chenqin Oct 20, 2019
3bb24d1
remove mvn property and replace with env setting to build test jar
chenqin Oct 21, 2019
0a57487
per feedback
Oct 21, 2019
610febc
Merge branch 'master' into master
Oct 22, 2019
f73f8c3
Merge remote-tracking branch 'upstream/master'
hcho3 Oct 22, 2019
4ed9fe7
revert copyright headlines, point to dmlc/rabit
Oct 24, 2019
f5712c4
revert python lint
Oct 25, 2019
51c79fa
remove multiple failure test case as retry is not enabled in spark
Oct 25, 2019
e656dab
Update core.py
chenqin Oct 29, 2019
f5709a0
Update core.py
chenqin Oct 29, 2019
458e76e
per feedback, style fix
chenqin Oct 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 10 additions & 30 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,36 +94,16 @@ set_target_properties(dmlc PROPERTIES
list(APPEND LINKED_LIBRARIES_PRIVATE dmlc)

# rabit
# full rabit doesn't build on windows, so we can't import it as subdirectory
if(MINGW OR R_LIB OR WIN32)
set(RABIT_SOURCES
rabit/src/engine_empty.cc
rabit/src/c_api.cc)
else ()
if(RABIT_MOCK)
set(RABIT_SOURCES
rabit/src/allreduce_base.cc
rabit/src/allreduce_robust.cc
rabit/src/engine_mock.cc
rabit/src/c_api.cc)
else()
set(RABIT_SOURCES
rabit/src/allreduce_base.cc
rabit/src/allreduce_robust.cc
rabit/src/engine.cc
rabit/src/c_api.cc)
endif(RABIT_MOCK)
endif (MINGW OR R_LIB OR WIN32)
add_library(rabit STATIC ${RABIT_SOURCES})
target_include_directories(rabit PRIVATE
$<BUILD_INTERFACE:${CMAKE_CURRENT_LIST_DIR}/dmlc-core/include>
$<BUILD_INTERFACE:${CMAKE_CURRENT_LIST_DIR}/rabit/include/rabit>)
set_target_properties(rabit
PROPERTIES
CXX_STANDARD 11
CXX_STANDARD_REQUIRED ON
POSITION_INDEPENDENT_CODE ON)
list(APPEND LINKED_LIBRARIES_PRIVATE rabit)
set(RABIT_BUILD_DMLC OFF)
set(DMLC_ROOT ${xgboost_SOURCE_DIR}/dmlc-core)
set(RABIT_WITH_R_LIB ${R_LIB})
add_subdirectory(rabit)

if (RABIT_MOCK)
list(APPEND LINKED_LIBRARIES_PRIVATE rabit_mock_static)
else()
list(APPEND LINKED_LIBRARIES_PRIVATE rabit)
endif(RABIT_MOCK)

# Exports some R specific definitions and objects
if (R_LIB)
Expand Down
3 changes: 2 additions & 1 deletion jvm-packages/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
tracker.py
build.sh
build.sh

4 changes: 3 additions & 1 deletion jvm-packages/create_jni.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"USE_HDFS": "OFF",
"USE_AZURE": "OFF",
"USE_S3": "OFF",

"RABIT_MOCK": "OFF",
chenqin marked this conversation as resolved.
Show resolved Hide resolved
"USE_CUDA": "OFF",
"JVM_BINDINGS": "ON"
}
Expand Down Expand Up @@ -68,6 +68,8 @@ def normpath(path):


if __name__ == "__main__":
if os.getenv("RABIT_MOCK", None) is not None:
CONFIG["RABIT_MOCK"] = str(os.getenv("RABIT_MOCK"))
if sys.platform == "darwin":
# Enable of your compiler supports OpenMP.
CONFIG["USE_OPENMP"] = "OFF"
Expand Down
2 changes: 1 addition & 1 deletion jvm-packages/scalastyle-config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ This file is divided into 3 sections:
<check level="error" class="org.scalastyle.file.HeaderMatchesChecker" enabled="true">
<parameters>
<parameter name="header"><![CDATA[/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors
chenqin marked this conversation as resolved.
Show resolved Hide resolved

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors
chenqin marked this conversation as resolved.
Show resolved Hide resolved

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -21,11 +21,11 @@ import java.nio.file.Files

import scala.collection.{AbstractIterator, mutable}
import scala.util.Random
import scala.collection.JavaConverters._

import ml.dmlc.xgboost4j.java.{IRabitTracker, Rabit, XGBoostError, RabitTracker => PyRabitTracker}
import ml.dmlc.xgboost4j.scala.rabit.RabitTracker
import ml.dmlc.xgboost4j.scala.spark.CheckpointManager.CheckpointParam
import ml.dmlc.xgboost4j.scala.spark.XGBoost.logger
import ml.dmlc.xgboost4j.scala.spark.params.LearningTaskParams
import ml.dmlc.xgboost4j.scala.{XGBoost => SXGBoost, _}
import ml.dmlc.xgboost4j.{LabeledPoint => XGBLabeledPoint}
Expand Down Expand Up @@ -221,6 +221,38 @@ private[this] class XGBoostExecutionParamsFactory(rawParams: Map[String, Any], s
xgbExecParam.setRawParamMap(overridedParams)
xgbExecParam
}

private[spark] def buildRabitParams: java.util.Map[java.lang.String, java.lang.String] = Map(
"rabit_reduce_ring_mincount" -> {
if (overridedParams.getOrElse("rabit_ring_reduce", false).toString.toBoolean) {
"1"
} else {
Integer.MAX_VALUE.toString
}
},
chenqin marked this conversation as resolved.
Show resolved Hide resolved
"rabit_reduce_buffer" ->
overridedParams.getOrElse("rabit_reduce_buffer", "256MB").toString,
"rabit_debug" ->
overridedParams.getOrElse("rabit_debug", false).toString,
"rabit_timeout" -> {
if (overridedParams.getOrElse("rabit_timeout", -1).toString.toInt > 0) {
"true"
} else {
"false"
}
},
"rabit_timeout_sec" -> {
chenqin marked this conversation as resolved.
Show resolved Hide resolved
if (overridedParams.getOrElse("rabit_timeout", -1).toString.toInt > 0) {
overridedParams.getOrElse("rabit_timeout", -1).toString
} else {
"1800"
}
},
"DMLC_WORKER_CONNECT_RETRY" ->
overridedParams.getOrElse("dmlc_worker_connect_retry", 5).toString,
"DMLC_WORKER_STOP_PROCESS_ON_ERROR" ->
overridedParams.getOrElse("dmlc_worker_stop_process_on_error", false).toString
).asJava
}

/**
Expand Down Expand Up @@ -321,7 +353,6 @@ object XGBoost extends Serializable {
}
val taskId = TaskContext.getPartitionId().toString
rabitEnv.put("DMLC_TASK_ID", taskId)
rabitEnv.put("DMLC_WORKER_STOP_PROCESS_ON_ERROR", "false")

try {
Rabit.init(rabitEnv)
Expand Down Expand Up @@ -490,8 +521,9 @@ object XGBoost extends Serializable {
evalSetsMap: Map[String, RDD[XGBLabeledPoint]] = Map()):
(Booster, Map[String, Array[Float]]) = {
logger.info(s"Running XGBoost ${spark.VERSION} with parameters:\n${params.mkString("\n")}")
val xgbExecParams = new XGBoostExecutionParamsFactory(params, trainingData.sparkContext).
buildXGBRuntimeParams
val xgbParamsFactory = new XGBoostExecutionParamsFactory(params, trainingData.sparkContext)
val xgbExecParams = xgbParamsFactory.buildXGBRuntimeParams
val xgbRabitParams = xgbParamsFactory.buildRabitParams
val sc = trainingData.sparkContext
val checkpointManager = new CheckpointManager(sc, xgbExecParams.checkpointParam.
checkpointPath)
Expand All @@ -510,13 +542,14 @@ object XGBoost extends Serializable {
val parallelismTracker = new SparkParallelismTracker(sc,
xgbExecParams.timeoutRequestWorkers,
xgbExecParams.numWorkers)
val rabitEnv = tracker.getWorkerEnvs

tracker.getWorkerEnvs().putAll(xgbRabitParams)
val boostersAndMetrics = if (hasGroup) {
trainForRanking(transformedTrainingData.left.get, xgbExecParams, rabitEnv,
checkpointRound, prevBooster, evalSetsMap)
trainForRanking(transformedTrainingData.left.get, xgbExecParams,
tracker.getWorkerEnvs(), checkpointRound, prevBooster, evalSetsMap)
} else {
trainForNonRanking(transformedTrainingData.right.get, xgbExecParams, rabitEnv,
checkpointRound, prevBooster, evalSetsMap)
trainForNonRanking(transformedTrainingData.right.get, xgbExecParams,
tracker.getWorkerEnvs(), checkpointRound, prevBooster, evalSetsMap)
}
val sparkJobThread = new Thread() {
override def run() {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -50,7 +50,7 @@ class XGBoostClassifier (
def this(xgboostParams: Map[String, Any]) = this(
Identifiable.randomUID("xgbc"), xgboostParams)

XGBoostToMLlibParams(xgboostParams)
XGBoost2MLlibParams(xgboostParams)
chenqin marked this conversation as resolved.
Show resolved Hide resolved

def setWeightCol(value: String): this.type = set(weightCol, value)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -21,7 +21,7 @@ import ml.dmlc.xgboost4j.scala.spark.params._
import org.apache.spark.ml.param.shared.HasWeightCol

private[spark] sealed trait XGBoostEstimatorCommon extends GeneralParams with LearningTaskParams
with BoosterParams with ParamMapFuncs with NonParamVariables {
with BoosterParams with RabitParams with ParamMapFuncs with NonParamVariables {

def needDeterministicRepartitioning: Boolean = {
getCheckpointPath.nonEmpty && getCheckpointInterval > 0
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -54,7 +54,7 @@ class XGBoostRegressor (
def this(xgboostParams: Map[String, Any]) = this(
Identifiable.randomUID("xgbr"), xgboostParams)

XGBoostToMLlibParams(xgboostParams)
XGBoost2MLlibParams(xgboostParams)

def setWeightCol(value: String): this.type = set(weightCol, value)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2014 by Contributors
Copyright (c) 2014 - 2019 by Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -241,7 +241,7 @@ trait HasNumClass extends Params {

private[spark] trait ParamMapFuncs extends Params {

def XGBoostToMLlibParams(xgboostParams: Map[String, Any]): Unit = {
def XGBoost2MLlibParams(xgboostParams: Map[String, Any]): Unit = {
for ((paramName, paramValue) <- xgboostParams) {
if ((paramName == "booster" && paramValue != "gbtree") ||
(paramName == "updater" && paramValue != "grow_histmaker,prune" &&
Expand Down
Loading