udpate

wbo4958 · Sep 25, 2024 · 1234701 · 1234701
1 parent 7cf6ec5
commit 1234701
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 29 deletions.
diff --git a/doc/install.rst b/doc/install.rst
@@ -159,7 +159,7 @@ R
 JVM
 ---
 
-* XGBoost4j/XGBoost4j-Spark
+* XGBoost4j-Spark
 
 .. code-block:: xml
   :caption: Maven
@@ -172,11 +172,6 @@ JVM
 
   <dependencies>
     ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j_${scala.binary.version}</artifactId>
-        <version>latest_version_num</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
@@ -188,11 +183,10 @@ JVM
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j" % "latest_version_num",
     "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
   )
 
-* XGBoost4j-GPU/XGBoost4j-Spark-GPU
+* XGBoost4j-Spark-GPU
 
 .. code-block:: xml
   :caption: Maven
@@ -205,11 +199,6 @@ JVM
 
   <dependencies>
     ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j-gpu_${scala.binary.version}</artifactId>
-        <version>latest_version_num</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark-gpu_${scala.binary.version}</artifactId>
@@ -221,15 +210,14 @@ JVM
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j-gpu" % "latest_version_num",
     "ml.dmlc" %% "xgboost4j-spark-gpu" % "latest_version_num"
   )
 
 This will check out the latest stable version from the Maven Central.
 
 For the latest release version number, please check `release page <https://github.com/dmlc/xgboost/releases>`_.
 
-To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
+To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
 
 
 .. note:: Windows not supported in the JVM package

diff --git a/doc/jvm/xgboost4j_spark_gpu_tutorial.rst b/doc/jvm/xgboost4j_spark_gpu_tutorial.rst
@@ -71,7 +71,7 @@ To make the Iris dataset recognizable to XGBoost, we need to encode the String-t
 label, i.e. "class", to the Double-typed label.
 
 One way to convert the String-typed label to Double is to use Spark's built-in feature transformer
-`StringIndexer <https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.ml.feature.StringIndexer>`_.
+`StringIndexer <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/feature/StringIndexer.html>`_.
 But this feature is not accelerated in RAPIDS Accelerator, which means it will fall back
 to CPU. Instead, we use an alternative way to achieve the same goal with the following code:
 
@@ -107,10 +107,10 @@ With window operations, we have mapped the string column of labels to label indi
 Training
 ========
 
-The GPU version of XGBoost-Spark supports both regression and classification
+XGBoost4j-Spark-Gpu supports regression, classification and ranking
 models. Although we use the Iris dataset in this tutorial to show how we use
-``XGBoost/XGBoost4J-Spark-GPU`` to resolve a multi-classes classification problem, the
-usage in Regression is very similar to classification.
+``XGBoost4J-Spark-GPU`` to resolve a multi-classes classification problem, the
+usage in Regression and Ranking is very similar to classification.
 
 To train a XGBoost model for classification, we need to define a XGBoostClassifier first:
 
@@ -168,12 +168,13 @@ model can then be used in other tasks like prediction.
 Prediction
 ==========
 
-When we get a model, either a XGBoostClassificationModel or a XGBoostRegressionModel, it takes a DataFrame as an input,
+When we get a model, a XGBoostClassificationModel or a XGBoostRegressionModel or a XGBoostRankerModel, it takes a DataFrame as an input,
 reads the column containing feature vectors, predicts for each feature vector, and outputs a new DataFrame
 with the following columns by default:
 
 * XGBoostClassificationModel will output margins (``rawPredictionCol``), probabilities(``probabilityCol``) and the eventual prediction labels (``predictionCol``) for each possible label.
 * XGBoostRegressionModel will output prediction a label(``predictionCol``).
+* XGBoostRankerModel will output prediction a label(``predictionCol``).
 
 .. code-block:: scala
 
@@ -226,25 +227,20 @@ would be ``"spark.task.resource.gpu.amount=1/spark.executor.cores"``. However, i
 using a XGBoost version earlier than 2.1.0 or a Spark standalone cluster version below 3.4.0,
 you still need to set ``"spark.task.resource.gpu.amount"`` equal to ``"spark.executor.resource.gpu.amount"``.
 
-.. note::
-
-  As of now, the stage-level scheduling feature in XGBoost is limited to the Spark standalone cluster mode.
-  However, we have plans to expand its compatibility to YARN and Kubernetes once Spark 3.5.1 is officially released.
-
 Assuming that the application main class is "Iris" and the application jar is "iris-1.0.0.jar",`
 provided below is an instance demonstrating how to submit the xgboost application to an Apache
 Spark Standalone cluster.
 
 .. code-block:: bash
 
-  rapids_version=23.10.0
-  xgboost_version=2.0.1
+  rapids_version=24.08.0
+  xgboost_version=$LATEST_VERSION
   main_class=Iris
   app_jar=iris-1.0.0.jar
 
   spark-submit \
     --master $master \
-    --packages com.nvidia:rapids-4-spark_2.12:${rapids_version},ml.dmlc:xgboost4j-gpu_2.12:${xgboost_version},ml.dmlc:xgboost4j-spark-gpu_2.12:${xgboost_version} \
+    --packages com.nvidia:rapids-4-spark_2.12:${rapids_version},ml.dmlc:xgboost4j-spark-gpu_2.12:${xgboost_version} \
     --conf spark.executor.cores=12 \
     --conf spark.task.cpus=1 \
     --conf spark.executor.resource.gpu.amount=1 \
@@ -255,7 +251,7 @@ Spark Standalone cluster.
     --class ${main_class} \
      ${app_jar}
 
-* First, we need to specify the ``RAPIDS Accelerator, xgboost4j-gpu, xgboost4j-spark-gpu`` packages by ``--packages``
+* First, we need to specify the ``RAPIDS Accelerator, xgboost4j-spark-gpu`` packages by ``--packages``
 * Second, ``RAPIDS Accelerator`` is a Spark plugin, so we need to configure it by specifying ``spark.plugins=com.nvidia.spark.SQLPlugin``
 
 For details about other ``RAPIDS Accelerator`` other configurations, please refer to the `configuration <https://nvidia.github.io/spark-rapids/docs/configs.html>`_.