[fed] Update federated learning branch. (dmlc#10569)

* [coll] Allow using local host for testing. (dmlc#10526) - Don't try to retrieve the IP address if a host is specified. - Fix compiler deprecation warning. * Fix boolean array for arrow-backed DF. (dmlc#10527) * [EM] Move prefetch in reset into the end of the iteration. (dmlc#10529) * Enhance the threadpool implementation. (dmlc#10531) - Accept an initialization function. - Support void return tasks. * [doc] Update link to release notes. [skip ci] (dmlc#10533) * [doc] Fix learning to rank tutorial. [skip ci] (dmlc#10539) * Cache GPU histogram kernel configuration. (dmlc#10538) * [sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (dmlc#10543) * reoder if-else statements for sycl compatibility * trigger check --------- Co-authored-by: Dmitry Razdoburdin <> * [EM] Basic distributed test for external memory. (dmlc#10492) * [sycl] Improve build configuration. (dmlc#10548) Co-authored-by: Dmitry Razdoburdin <> * [R] Update roxygen. (dmlc#10556) * [doc] Add more detailed explanations for advanced objectives (dmlc#10283) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [doc] Add `build_info` to autodoc. [skip ci] (dmlc#10551) * [doc] Add notes about RMM and device ordinal. [skip ci] (dmlc#10562) - Remove the experimental tag, we have been running it for a long time now. - Add notes about avoiding set CUDA device. - Add link in parameter. * Fix empty partition. (dmlc#10559) * Avoid the use of size_t in the partitioner. (dmlc#10541) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span. * [EM] Handle base idx in GPU histogram. (dmlc#10549) * [fed] Split up federated test CMake file. (dmlc#10566) - Collect all federated test files into the same directory. - Independently list the files. * Fixes. * Fix. --------- Co-authored-by: Dmitry Razdoburdin <d.razdoburdin@gmail.com> Co-authored-by: david-cortes <david.cortes.rivera@gmail.com>
ZiyueXu77 · Jul 12, 2024 · 04bf401 · 04bf401
1 parent f53e3ea
commit 04bf401
Show file tree

Hide file tree

Showing 70 changed files with 1,758 additions and 574 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -95,7 +95,7 @@ jobs:
       run: |
         mkdir build
         cd build
-        cmake .. -DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON -DPLUGIN_SYCL=ON -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
+        cmake .. -DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON -DPLUGIN_SYCL=ON -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
         make -j$(nproc)
     - name: Run gtest binary for SYCL
       run: |

diff --git a/.github/workflows/python_tests.yml b/.github/workflows/python_tests.yml
@@ -294,7 +294,7 @@ jobs:
       run: |
         mkdir build
         cd build
-        cmake .. -DPLUGIN_SYCL=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
+        cmake .. -DPLUGIN_SYCL=ON -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
         make -j$(nproc)
     - name: Install Python package
       run: |

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,8 +1,6 @@
 cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
 
 if(PLUGIN_SYCL)
-  set(CMAKE_CXX_COMPILER  "g++")
-  set(CMAKE_C_COMPILER  "gcc")
   string(REPLACE " -isystem ${CONDA_PREFIX}/include" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
 endif()
 

diff --git a/R-package/DESCRIPTION b/R-package/DESCRIPTION
@@ -66,6 +66,6 @@ Imports:
     data.table (>= 1.9.6),
     jsonlite (>= 1.0)
 Roxygen: list(markdown = TRUE)
-RoxygenNote: 7.3.1
+RoxygenNote: 7.3.2
 Encoding: UTF-8
 SystemRequirements: GNU make, C++17
diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R
@@ -102,6 +102,18 @@
 #'            It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
 #'            \href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.}
 #'   }
+#'
+#'  For custom objectives, one should pass a function taking as input the current predictions (as a numeric
+#'  vector or matrix) and the training data (as an `xgb.DMatrix` object) that will return a list with elements
+#'  `grad` and `hess`, which should be numeric vectors or matrices with number of rows matching to the numbers
+#'  of rows in the training data (same shape as the predictions that are passed as input to the function).
+#'  For multi-valued custom objectives, should have shape `[nrows, ntargets]`. Note that negative values of
+#'  the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
+#'  objective is non-convex.
+#'
+#'  See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{
+#'  Custom Objective and Evaluation Metric} and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{
+#'  Advanced Usage of Custom Objectives} for more information about custom objectives.
 #'  }
 #'   \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
 #'   \item{ \code{eval_metric} evaluation metrics for validation data.

diff --git a/R-package/man/xgb.train.Rd b/R-package/man/xgb.train.Rd
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 [Documentation](https://xgboost.readthedocs.org) |
 [Resources](demo/README.md) |
 [Contributors](CONTRIBUTORS.md) |
-[Release Notes](NEWS.md)
+[Release Notes](https://xgboost.readthedocs.io/en/latest/changes/index.html)
 
 XGBoost is an optimized distributed gradient boosting library designed to be highly ***efficient***, ***flexible*** and ***portable***.
 It implements machine learning algorithms under the [Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting) framework.

diff --git a/demo/guide-python/custom_softmax.py b/demo/guide-python/custom_softmax.py
@@ -6,7 +6,8 @@
 XGBoost returns transformed prediction for multi-class objective function.  More details
 in comments.
 
-See :doc:`/tutorials/custom_metric_obj` for detailed tutorial and notes.
+See :doc:`/tutorials/custom_metric_obj` and :doc:`/tutorials/advanced_custom_obj` for
+detailed tutorial and notes.
 
 '''
 
@@ -39,7 +40,9 @@ def softmax(x):
 
 
 def softprob_obj(predt: np.ndarray, data: xgb.DMatrix):
-    '''Loss function.  Computing the gradient and approximated hessian (diagonal).
+    '''Loss function. Computing the gradient and upper bound on the
+    Hessian with a diagonal structure for XGBoost (note that this is
+    not the true Hessian).
     Reimplements the `multi:softprob` inside XGBoost.
 
     '''
@@ -61,7 +64,7 @@ def softprob_obj(predt: np.ndarray, data: xgb.DMatrix):
 
     eps = 1e-6
 
-    # compute the gradient and hessian, slow iterations in Python, only
+    # compute the gradient and hessian upper bound, slow iterations in Python, only
     # suitable for demo.  Also the one in native XGBoost core is more robust to
     # numeric overflow as we don't do anything to mitigate the `exp` in
     # `softmax` here.

diff --git a/demo/rmm_plugin/README.rst b/demo/rmm_plugin/README.rst
@@ -1,5 +1,5 @@
-Using XGBoost with RAPIDS Memory Manager (RMM) plugin (EXPERIMENTAL)
-====================================================================
+Using XGBoost with RAPIDS Memory Manager (RMM) plugin
+=====================================================
 
 `RAPIDS Memory Manager (RMM) <https://github.com/rapidsai/rmm>`__ library provides a
 collection of efficient memory allocators for NVIDIA GPUs. It is now possible to use
@@ -47,5 +47,15 @@ the global configuration ``use_rmm``:
   with xgb.config_context(use_rmm=True):
     clf = xgb.XGBClassifier(tree_method="hist", device="cuda")
 
-Depending on the choice of memory pool size or type of allocator, this may have negative
-performance impact.
+Depending on the choice of memory pool size and the type of the allocator, this can add
+more consistency to memory usage but with slightly degraded performance impact.
+
+*******************************
+No Device Ordinal for Multi-GPU
+*******************************
+
+Since with RMM the memory pool is pre-allocated on a specific device, changing the CUDA
+device ordinal in XGBoost can result in memory error ``cudaErrorIllegalAddress``. Use the
+``CUDA_VISIBLE_DEVICES`` environment variable instead of the ``device="cuda:1"`` parameter
+for selecting device. For distributed training, the distributed computing frameworks like
+``dask-cuda`` are responsible for device management.
diff --git a/doc/changes/index.rst b/doc/changes/index.rst
@@ -2,6 +2,8 @@
 Release Notes
 #############
 
+For release notes prior to the 2.1 release, please see `news <https://github.com/dmlc/xgboost/blob/master/NEWS.md>`__ .
+
 .. toctree::
   :maxdepth: 1
   :caption: Contents:

diff --git a/doc/parameter.rst b/doc/parameter.rst
@@ -25,7 +25,11 @@ Global Configuration
 The following parameters can be set in the global scope, using :py:func:`xgboost.config_context()` (Python) or ``xgb.set.config()`` (R).
 
 * ``verbosity``: Verbosity of printing messages. Valid values of 0 (silent), 1 (warning), 2 (info), and 3 (debug).
-* ``use_rmm``: Whether to use RAPIDS Memory Manager (RMM) to allocate GPU memory. This option is only applicable when XGBoost is built (compiled) with the RMM plugin enabled. Valid values are ``true`` and ``false``.
+
+* ``use_rmm``: Whether to use RAPIDS Memory Manager (RMM) to allocate cache GPU
+  memory. The primary memory is always allocated on the RMM pool when XGBoost is built
+  (compiled) with the RMM plugin enabled. Valid values are ``true`` and ``false``. See
+  :doc:`/python/rmm-examples/index` for details.
 
 ******************
 General Parameters

diff --git a/doc/python/python_api.rst b/doc/python/python_api.rst
@@ -14,6 +14,8 @@ Global Configuration
 
 .. autofunction:: xgboost.get_config
 
+.. autofunction:: xgboost.build_info
+
 Core Data Structure
 -------------------
 .. automodule:: xgboost.core