-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] add Rapids plugin support #7491
Conversation
Add GPU train/transform support for XGBoost4j-Spark-Gpu by leveraging spark-rapids.
@trivialfis Could you help to review it. Thx |
I just did the performance ETL+Trainging test for CPU and GPU on Mortgage 2000-year-dataset and got about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review. I noticed that external memory is mentioned in the code, is there any usage example of it in Spark package?
<scala.version>2.12.8</scala.version> | ||
<scala.binary.version>2.12</scala.binary.version> | ||
<hadoop.version>2.7.3</hadoop.version> | ||
<maven.wagon.http.retryHandler.count>5</maven.wagon.http.retryHandler.count> | ||
<log.capi.invocation>OFF</log.capi.invocation> | ||
<use.cuda>OFF</use.cuda> | ||
<cudf.version>21.08.2</cudf.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to move it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both xgboost4j-gpu and xgboost4j-spark-gpu need cudf.version
.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala
Outdated
Show resolved
Hide resolved
case regressor: XGBoostRegressor => if (regressor.isDefined(regressor.groupCol)) { | ||
regressor.getGroupCol } else "" | ||
case _: XGBoostClassifier => "" | ||
case _ => throw new RuntimeException("Unsupporting estimator: " + estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _ => throw new RuntimeException("Unsupporting estimator: " + estimator) | |
case _ => throw new RuntimeException("Unsupported estimator: " + estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked all the comments, and seems no extra commit is needed. So I'd like to fix it in the following PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
require(est.isDefined(est.treeMethod) && est.getTreeMethod.equals("gpu_hist"), | ||
s"GPU train requires tree_method set to gpu_hist") | ||
val groupName = estimator match { | ||
case regressor: XGBoostRegressor => if (regressor.isDefined(regressor.groupCol)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the regressor in the spark package responsible for ranking too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's weird.
.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala
Show resolved
Hide resolved
.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala
Show resolved
Hide resolved
isCacheData: Boolean): Map[String, ColumnDataBatch] = { | ||
// Cache is not supported | ||
if (isCacheData) { | ||
logger.warn("Dataset cache is not support for GPU pipeline!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a cache in the context of the spark package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache which means cache the spark computation result in local storage. Gpu Pipeline only accelerate the Dataset cache instead of RDD cache. so for now we just disable the cache.
will figure out a way to handle this in the following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. That sounds fine.
noEvalSet: Boolean): RDD[Watches] = { | ||
|
||
val sc = dataMap(TRAIN_NAME).rawDF.sparkSession.sparkContext | ||
val maxBin = xgbExeParams.toMap.getOrElse("max_bin", 256).asInstanceOf[Int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we lower this configuration into C++ or set a default parameter on the API that's more visible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, XGBoost java has set the default value to 256. Here, it's just a safer way to get max_bin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the parameter is not optional then I think this safety is not necessary. We should aim for removing any duplicated logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed to be removed. but if you insist, I can get rid of it in the following PR
...ges/xgboost4j-spark-gpu/src/test/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPerTest.scala
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,292 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GpuPerTest
vs. GpuPreTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GpuPerTest is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
I just ran another round of xgboost training on Mortgage Rows: 83, 270, 160, features Columns: 27 on Spark local mode Using the latest spark-rapids jars and cudf jar
The speed up is 6.4x
The speed up is 2.79x. Just as I said, there is room for optimization for Gpu ETL+train |
@hcho3 @RAMitchell I looked into the PR and it seems fine to me. But I haven't been able to provide detailed reviews due to my lack of experience with spark and the size of this PR. Would be really helpful if you can take a look into this. |
@hcho3 @RAMitchell, Could you help to review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wbo4958.
For now, there is an issue for CPU transform. Model A transforms CPU dataset which reads from fileABC and get the result AA, while Model A transforms GPU dataset which reads from fileABC and get the result BB. We expect AA should be equal BB, but in fact, they are different.
Can you elaborate on this and why it cannot be fixed in this PR?
One concern we have had recently is the run time of the JVM tests on CI. JVM uses a disproportionately large amount of the CI budget. Can you measure how long the tests take on CI and make sure the time is not significantly increasing due to this PR?
Apart from the above, I'm inclined to merge this as it's mostly tests.
Hi @RAMitchell, I can fix this in this PR. But it is really the CPU legacy bug that is not introduced by any my previous PRs. So I'd like to have another PR to fix that. Yeah, actually it will cost a lot of time when running the whole CPU unit tests. I've disabled the CPU unit tests when running GPU unit tests which are pretty fast. |
This PR is the final PR for #7361.
For now, there is an issue for CPU transform. Model A transforms CPU dataset which reads from fileABC and get the result AA, while Model A transforms GPU dataset which reads from fileABC and get the result BB. We expect AA should be equal BB, but in fact, they are different.
I've figured out why. will create an following PR to fix that after this PR merged.