New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[jvm-packages] add Rapids plugin support #7491

Merged

trivialfis merged 3 commits into dmlc:master from wbo4958:xgb-spark-gpu

Dec 17, 2021

Contributor

wbo4958 commented Nov 30, 2021

This PR is the final PR for #7361.

For now, there is an issue for CPU transform. Model A transforms CPU dataset which reads from fileABC and get the result AA, while Model A transforms GPU dataset which reads from fileABC and get the result BB. We expect AA should be equal BB, but in fact, they are different.

I've figured out why. will create an following PR to fix that after this PR merged.


          add Rapids plugin support

ac4818a

Add GPU train/transform support for XGBoost4j-Spark-Gpu by leveraging
spark-rapids.

Contributor Author

wbo4958 commented Nov 30, 2021

@trivialfis Could you help to review it. Thx


          fix bug

026ea30

Contributor Author

wbo4958 commented Dec 2, 2021

I just did the performance ETL+Trainging test for CPU and GPU on Mortgage 2000-year-dataset and got about (541−180)÷180=2x speed up

trivialfis reviewed

View reviewed changes

Member

trivialfis left a comment

Initial review. I noticed that external memory is mentioned in the code, is there any usage example of it in Spark package?

jvm-packages/pom.xml

                       <scala.version>2.12.8</scala.version>
                       <scala.binary.version>2.12</scala.binary.version>
                       <hadoop.version>2.7.3</hadoop.version>
                       <maven.wagon.http.retryHandler.count>5</maven.wagon.http.retryHandler.count>
                       <log.capi.invocation>OFF</log.capi.invocation>
                       <use.cuda>OFF</use.cuda>
+                      <cudf.version>21.08.2</cudf.version>

Member

trivialfis Dec 7, 2021

Why is it necessary to move it here?

Contributor Author

wbo4958 Dec 7, 2021

Both xgboost4j-gpu and xgboost4j-spark-gpu need cudf.version

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Outdated Show resolved Hide resolved

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Outdated

+                          case regressor: XGBoostRegressor => if (regressor.isDefined(regressor.groupCol)) {
+                            regressor.getGroupCol } else ""
+                          case _: XGBoostClassifier => ""
+                          case _ => throw new RuntimeException("Unsupporting estimator: " + estimator)

Member

trivialfis Dec 7, 2021

Suggested change

      
                        case _ => throw new RuntimeException("Unsupporting estimator: " + estimator)
          
                        case _ => throw new RuntimeException("Unsupported estimator: " + estimator)

Contributor Author

wbo4958 Dec 7, 2021

Just checked all the comments, and seems no extra commit is needed. So I'd like to fix it in the following PR.

Contributor Author

wbo4958 Dec 7, 2021

Done

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala

+                        require(est.isDefined(est.treeMethod) && est.getTreeMethod.equals("gpu_hist"),
+                          s"GPU train requires tree_method set to gpu_hist")
+                        val groupName = estimator match {
+                          case regressor: XGBoostRegressor => if (regressor.isDefined(regressor.groupCol)) {

Member

trivialfis Dec 7, 2021

Is the regressor in the spark package responsible for ranking too?

Contributor Author

wbo4958 Dec 7, 2021

Yes, correct.

Member

trivialfis Dec 7, 2021

That's weird.

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Show resolved Hide resolved

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Show resolved Hide resolved

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Outdated

+                    isCacheData: Boolean): Map[String, ColumnDataBatch] = {
+                  // Cache is not supported
+                  if (isCacheData) {
+                    logger.warn("Dataset cache is not support for GPU pipeline!")

Member

trivialfis Dec 7, 2021

What is a cache in the context of the spark package?

Member

trivialfis Dec 7, 2021

Contributor Author

wbo4958 Dec 7, 2021 •

edited

Loading

cache which means cache the spark computation result in local storage. Gpu Pipeline only accelerate the Dataset cache instead of RDD cache. so for now we just disable the cache.

will figure out a way to handle this in the following PR

Member

trivialfis Dec 10, 2021

Got it. That sounds fine.

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala

+                    noEvalSet: Boolean): RDD[Watches] = {
+                  val sc = dataMap(TRAIN_NAME).rawDF.sparkSession.sparkContext
+                  val maxBin = xgbExeParams.toMap.getOrElse("max_bin", 256).asInstanceOf[Int]

Member

trivialfis Dec 7, 2021

Can we lower this configuration into C++ or set a default parameter on the API that's more visible?

Contributor Author

wbo4958 Dec 7, 2021

Actually, XGBoost java has set the default value to 256. Here, it's just a safer way to get max_bin.

Member

trivialfis Dec 7, 2021

If the parameter is not optional then I think this safety is not necessary. We should aim for removing any duplicated logic.

Contributor Author

wbo4958 Dec 7, 2021

I don't think this is needed to be removed. but if you insist, I can get rid of it in the following PR

...ges/xgboost4j-spark-gpu/src/test/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPerTest.scala Outdated Show resolved Hide resolved

...ges/xgboost4j-spark-gpu/src/test/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPerTest.scala Outdated

		@@ -0,0 +1,292 @@
		/*

Member

trivialfis Dec 7, 2021

GpuPerTest vs. GpuPreTest?

Contributor Author

wbo4958 Dec 7, 2021

GpuPerTest is correct.

Member

trivialfis Dec 7, 2021

What does it mean?

Contributor Author

wbo4958 Dec 7, 2021

Done

Member

trivialfis commented Dec 7, 2021

I noticed that external memory is mentioned in the code, is there any usage example of it in Spark package?


          resolve comments

29ad446

Contributor Author

wbo4958 commented Dec 10, 2021 •

edited

Loading

I just ran another round of xgboost training on Mortgage Rows: 83, 270, 160, features Columns: 27 on Spark local mode Using the latest spark-rapids jars and cudf jar

Type	time (s)
CPU load + train	464.432
GPU load + train	62.354

The speed up is 6.4x

Type	time (s)
CPU load + ETL+ train	1222.662
GPU load + ETL + train	322.677

The speed up is 2.79x.

Just as I said, there is room for optimization for Gpu ETL+train

Member

trivialfis commented Dec 10, 2021

@hcho3 @RAMitchell I looked into the PR and it seems fine to me. But I haven't been able to provide detailed reviews due to my lack of experience with spark and the size of this PR. Would be really helpful if you can take a look into this.

Contributor Author

wbo4958 commented Dec 14, 2021

@hcho3 @RAMitchell, Could you help to review it?

trivialfis requested review from hcho3 and RAMitchell

December 15, 2021 02:30

RAMitchell reviewed

View reviewed changes

Member

RAMitchell left a comment

Thanks @wbo4958.

For now, there is an issue for CPU transform. Model A transforms CPU dataset which reads from fileABC and get the result AA, while Model A transforms GPU dataset which reads from fileABC and get the result BB. We expect AA should be equal BB, but in fact, they are different.

Can you elaborate on this and why it cannot be fixed in this PR?

One concern we have had recently is the run time of the JVM tests on CI. JVM uses a disproportionately large amount of the CI budget. Can you measure how long the tests take on CI and make sure the time is not significantly increasing due to this PR?

Apart from the above, I'm inclined to merge this as it's mostly tests.

Contributor Author

wbo4958 commented Dec 16, 2021

Hi @RAMitchell, I can fix this in this PR. But it is really the CPU legacy bug that is not introduced by any my previous PRs. So I'd like to have another PR to fix that.

Yeah, actually it will cost a lot of time when running the whole CPU unit tests. I've disabled the CPU unit tests when running GPU unit tests which are pretty fast.

trivialfis merged commit 24e2580 into dmlc:master

wbo4958 deleted the xgb-spark-gpu branch

December 20, 2021 01:50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet