-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp the rabit implementation. #10112
Conversation
@rongou Is using |
Yes, in a federated setting, a participant may not want to expose the host name to the rest of the group. |
17d179b
to
834e598
Compare
3f702cf
to
0380974
Compare
17fa5a8
to
85ac8f8
Compare
Tracking apache/arrow#41058 Need to remove the war in CI once a new snappy is published. |
3001987
to
1421115
Compare
6ac4fe2
to
633da12
Compare
633da12
to
3bf7a7f
Compare
852a9f8
to
45c8fa8
Compare
@wbo4958 Please help look into the changes to JVM packages when you are available. |
XGBoostJNI.checkCall(XGBoostJNI.TrackerRun(this.handle)); | ||
this.tracker_daemon = new Thread(() -> { | ||
try { | ||
XGBoostJNI.checkCall(XGBoostJNI.TrackerWaitFor(this.handle, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why tracker_daemon is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it as a handle for the implementation of uncaughtException
, do you have suggestions for handling it differently?
931010e
to
ee8e203
Compare
@rongou The PR is ready for an initial review. I can't extract more small PRs since the global communicator is swapped. Need to investigate the JVM dependency and the flaky error. |
get host name. send port. all gather. assert. op. begin work on bootstrap. utilities. work on bootstrap. move listener. catch. comm. async send. block. block. Start work on async. batch poll. tests. move. Start working on tracker. better tests. work on tracker. bind. work on accepting workers. complete allgather. Move. Send with JSON. work on shutdown. msg. compare task. rename. Move. cleanup. Start work on broadcast. Work comm. Cleanup bootstrap. hide. Move to bootstrap. non blocking. cleanup. any op. shift. cleanup. Cleanup. per-thread. checks. start working on nccl. backend. Get the prototype compile. log. test print. timeout on connection. get nccl allreduce basic. look into federated. proto. scatter reduce. allreduce prototype. Work on tests. cleanup. Initialization. Init. Work on Python. get args. Start working on allgatherv. convert some allreduce. remove some old use. remove cpu impl. work on gpu. play with dlopen. convert. convert. convert. placeholder. backend. work on federated. remove. Move. Federated tracker. Move. move into comm. GPU variant. not just nccl. fix. fix. Convert. bitwise. stream. copying allgather. replace. Remove. remove. Remove device. Remove rabit. Remove rabit. cmake. tests. use gmock. Move. Split. init. Extract. compiler. test timeout. exc. comments. Tests for federated. Remove. remove. Split up. refactor tests. format. extract magic number. Extract more commands. refactor. Remove. Reduce dependency on c api. remove old code. throw. coll error. indirect. look into dask module. parameters. command. probing. listen for error. debug. host. cleanup. dask. loop. working basic. header. guard. test. type. socket. cleanup & notes. use a state machine. work on tests. header. test channel. cleanup. cleanup broadcast. unneeded changes. allgather string. Fixes. cleanup rebase. fixes after rebase. split up nccl comm. Move data copying. allgatherv test. Extract. tests. test allreduce. remove the use of ctx. tests. rebase. work on fed. work on allgatherv. name. lint. Split. split. remove gmock. move. CPU. CUDA. compile. Cleanup. header. Work on tests. checks. fixes. tests. work on CUDA test. comm. Share the implementation. tests. cleanup. cleanup. cleanup cleanup. set device. cleanup. cleanup. more. cleanup. Get it work. wait. revert dask changes. time. remove reference to encoder. extract. extract. split up the training function. Fix. deterministic. Fix. debug. Fixes. remove. cleanup. fix. Move worker env. cleanup. cleanup. wait. cleanup. extract error handling. get abort to work as well. Move. policy. cleanups. cleanup. Split up. doc. Cleanup ctor. tests. tests. tests. configuration. tests. task id. start working on metric tests. Remove. type. agg. fix seq. tests. start working on cuda test. type. fixes. tests. Use device ord. Remove auc. remove elementwise. remove multi-class cleanup aft. cleanup ranking. remove old tests. headers. Move. move. single gpu tests. Cleanup C API. unknown. C API. Small cleanup. cleanup. Fix. cleanup. work on async queue. work on sync. Use blocking op. result. Fuzzing. result. Remove coll error. Move. cleanup. cleanup. cleanup. cleanup. cleanup. cleanup. cleanup. Fix removal. test lint. remove. cleanup. cleanup. cleanup. invoke result. note. Fix rebase. Fix rebase. fix & cleanup. Fix. remove coll error for now. cleanup. replace. replace. replace. test basic. cleanup, fix. deduced size. cleanup. Convert to new routines. Fixes. Fix. Add test. use vector. cleanup. safe coll. Fix. Fix. Fix. Fix. cleanup. cleanup. Don't throw. Fixes. v6 timeout. Cleanups. Remove error handling for now. Timeout. syc. mac. cli. remove. types. federated. build. build. sortby. build. windows, macos. macos. federated. macos. lint. skip finalize. Forbid empty data. Shutdown before dtor. small allreduce. rounddown. empty input. remove warning. remove get host IP. take down the jvm package for now. stop early. annotation. lint. debug check. windows. np.bool. Work on shutdown. test blocking. remove dask error. Detach. comments. blocking. display timeout. debug github error. checks. Switch the order. revert debug log. fix tests. delete. reverse. don't block. improved error. clear; shutdown the tracker. release lock early. comments. macos. windows. Move. const. Unify ctor. remove exceptions. lint, comment. freeze pyarrow. windows. r package. Fix CI. Start looking into jvm chpk jni. c test. remove extra argument. tracker. interrupt. cleanup. Fix spark profiling. Compile. Start convert the scala package. Log init. cleanup test. communicator. alive. tests log. Revert "log." This reverts commit 3bc6d82. Shutdown when exit. remove tracker return code. windows build. shutdown only if not closed. lint. protect the listener. concat. Debug log. detect EOF. Revert "Debug log." This reverts commit a3e0bd9. Cleanup. Fixes. lint. don't omit frame pointer. Refactor tests. Fix minimum build. Fix distributed tests on single GPU. cleanup & win build. MacOS compilation. typo. macos, jvm. Windows socket. Ignore POLLHUP Handle shutdown. unix socket fix MSVC Sock error Restore the shutdown signal. states windows sock enable win tests lint. update. Skip tests. skip only for gpu. cleanup. cleanup. remove error code for now. Documents. lower case. rename. Fix. Fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. We can always fix any issues that come up in a later PR. ;)
…it-worker-get-all
* [pyspark] rework the log (#10077) * Add CUDA iterator to tensor view. (#10074) * Disable column sample by node for the exact tree method. (#10083) * [R] Refactor callback structure and attributes (#9957) * [sycl] add partitioning and related tests (#10080) Co-authored-by: Dmitry Razdoburdin <> * Small cleanup for mock tests. (#10085) * [CI] Test R package with CMake (#10087) * [CI] Test R package with CMake * Fix * Fix * Update test_r_package.py * Fix CMake flag for R package * Install system deps * Fix * Use sudo * [CI] Cancel GH Action job if a newer commit is published (#10088) * Optional normalization for learning to rank. (#10094) * Support graphviz plot for multi-target tree. (#10093) * [R] Rename `watchlist` -> `evals` (#10032) * [doc] Fix the default value for `lambdarank_pair_method`. (#10098) * Fix pairwise objective with NDCG metric along with custom gain. (#10100) * Fix pairwise objective with NDCG metric. - Allow setting `ndcg_exp_gain` for `rank:pairwise`. This is useful when using pairwise for objective but ndcg for metric. * [R] deprecate watchlist (#10110) * [SYCL] Add split evaluation (#10119) --------- Co-authored-by: Dmitry Razdoburdin <> * Fix compilation with the latest ctk. (#10123) * Use `std::uint64_t` for row index. (#10120) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header. * Work with IPv6 in the new tracker. (#10125) * [CI] Update scorecard actions. (#10133) * [CI] Fix yml in github action. (#10134) * add sycl reaslisation of ghist builder (#10138) Co-authored-by: Dmitry Razdoburdin <> * Cleanup set info. (#10139) - Use the array interface internally. - Deprecate `XGDMatrixSetDenseInfo`. - Deprecate `XGDMatrixSetUIntInfo`. - Move the handling of `DataType` into the deprecated C function. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Update collective implementation. (#10152) * Update collective implementation. - Cleanup resource during `Finalize` to avoid handling threads in destructor. - Calculate the size for allgather automatically. - Use simple allgather for small (smaller than the number of worker) allreduce. * [R] Make `xgb.cv` work with `xgb.DMatrix` only, adding support for survival and ranking fields (#10031) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * docs: fix bug in tutorial (#10143) * Bump org.apache.maven.plugins:maven-gpg-plugin from 3.1.0 to 3.2.2 in /jvm-packages/xgboost4j-spark (#10151) * Fix pyspark with verbosity=3. (#10172) * Fix global config for external memory. (#10173) Pass the thread-local configuration between threads. * [doc] Update python3statement URL (#10179) * [CI] Update create-pull-request action * [SYCL] Add basic features for QuantileHistMaker (#10174) --------- Co-authored-by: Dmitry Razdoburdin <> * [CI] Use latest RAPIDS; Pandas 2.0 compatibility fix (#10175) * [CI] Update RAPIDS to latest stable * [CI] Use rapidsai stable channel; fix syntax errors in Dockerfile.gpu * Don't combine astype() with loc() * Work around #10181 * Fix formatting * Fix test --------- Co-authored-by: hcho3 <hcho3@users.noreply.github.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> * docs: update Ruby package link (#10182) * [CI] Reduce clutter from dependabot (#10187) * [jvm-packages] Ombinus patch to update all minor dependencies (#10188) * Fold in #10184 * Fold in #10176 * Fold in #10168 * Fold in #10165 * Fold in #10164 * Fold in #10155 * Fold in #10062 * Fold in #9984 * Fold in #9843 * Upgrade to Maven 3.6.3 * Bump org.apache.maven.plugins:maven-jar-plugin (#10191) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.3.0 to 3.4.0. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.3.0...maven-jar-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [coll] Improve event loop. (#10199) - Add a test for blocking calls. - Do not require the queue to be empty after waking up; this frees up the thread to answer blocking calls. - Handle EOF in read. - Improve the error message in the result. Allow concatenation of multiple results. * [CI] Update machine images (#10201) * Bump org.apache.maven.plugins:maven-jar-plugin (#10202) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.3.0 to 3.4.0. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.3.0...maven-jar-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] Reuse the collective communicator. (#10198) * Bump org.scala-lang.modules:scala-collection-compat_2.12 (#10193) Bumps [org.scala-lang.modules:scala-collection-compat_2.12](https://github.com/scala/scala-collection-compat) from 2.11.0 to 2.12.0. - [Release notes](https://github.com/scala/scala-collection-compat/releases) - [Commits](scala/scala-collection-compat@v2.11.0...v2.12.0) --- updated-dependencies: - dependency-name: org.scala-lang.modules:scala-collection-compat_2.12 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump scalatest.version from 3.2.17 to 3.2.18 in /jvm-packages/xgboost4j (#10196) Bumps `scalatest.version` from 3.2.17 to 3.2.18. Updates `org.scalatest:scalatest_2.12` from 3.2.17 to 3.2.18 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.17...release-3.2.18) Updates `org.scalactic:scalactic_2.12` from 3.2.17 to 3.2.18 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.17...release-3.2.18) --- updated-dependencies: - dependency-name: org.scalatest:scalatest_2.12 dependency-type: direct:development update-type: version-update:semver-patch - dependency-name: org.scalactic:scalactic_2.12 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [coll] Add global functions. (#10203) * Bump org.apache.flink:flink-clients in /jvm-packages (#10197) Bumps [org.apache.flink:flink-clients](https://github.com/apache/flink) from 1.18.0 to 1.19.0. - [Commits](apache/flink@release-1.18.0...release-1.19.0) --- updated-dependencies: - dependency-name: org.apache.flink:flink-clients dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] support stage-level for yarn/k8s (#10209) * [coll] Implement shutdown for tracker and comm. (#10208) - Force shutdown the tracker. - Implement shutdown notice for error handling thread in comm. * [doc] Add typing to dask demos. (#10207) * [SYCL] Add sampling initialization (#10216) --------- Co-authored-by: Dmitry Razdoburdin <> * [CI] Test new setup-r. (#10228) * [CI] Use native arm64 worker in GHAction to build M1 wheel (#10225) * [CI] Use native arm64 worker in GHAction to build M1 wheel * Set up Conda * Use mamba * debug * fix * fix * fix * fix * fix * Temporarily disable other tests * Fix prefix * Use micromamba * Use conda-incubator/setup-miniconda * Use mambaforge * Fix * Fix prefix * Don't use deprecated set-output * Add verbose output from build * verbose * Specify arch * Bump setup-miniconda to v3 * Use Python 3.9 * Restore deleted files * WAR. --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump hadoop.version from 3.3.6 to 3.4.0 in /jvm-packages/xgboost4j (#10156) Bumps `hadoop.version` from 3.3.6 to 3.4.0. Updates `org.apache.hadoop:hadoop-hdfs` from 3.3.6 to 3.4.0 Updates `org.apache.hadoop:hadoop-common` from 3.3.6 to 3.4.0 --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-hdfs dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.hadoop:hadoop-common dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10217) Bumps net.alchim31.maven:scala-maven-plugin from 4.8.1 to 4.9.0. --- updated-dependencies: - dependency-name: net.alchim31.maven:scala-maven-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-jar-plugin (#10210) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.0 to 3.4.1. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.0...maven-jar-plugin-3.4.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-gpg-plugin (#10211) Bumps [org.apache.maven.plugins:maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin) from 3.2.3 to 3.2.4. - [Release notes](https://github.com/apache/maven-gpg-plugin/releases) - [Commits](apache/maven-gpg-plugin@maven-gpg-plugin-3.2.3...maven-gpg-plugin-3.2.4) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-gpg-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] Sort workers by task ID. (#10220) * Bump org.apache.spark:spark-mllib_2.12 (#10070) Bumps org.apache.spark:spark-mllib_2.12 from 3.4.1 to 3.5.1. --- updated-dependencies: - dependency-name: org.apache.spark:spark-mllib_2.12 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support more sklearn tags for testing. (#10230) * Update nvtx. (#10227) * [sycl] add data initialisation for training (#10222) Co-authored-by: Dmitry Razdoburdin <> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Fixes for numpy 2.0. (#10252) * [jvm-packagaes] Freeze spark to 3.4.1 for now. (#10253) The newer spark version for CPU conflicts with the more conservative version used by rapids. * [jvm-packages] fix group col for gpu packages (#10254) * [sycl] add loss guided hist building (#10251) Co-authored-by: Dmitry Razdoburdin <> * Be more lenient on floating point error for AUC. (#10264) * [CI] Upgrade setup-r. (#10267) * Fixes for the latest pandas. (#10266) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Keep GitHub Actions up to date with Dependabot (#10268) # Fixes software supply chain safety warnings like at the bottom right of https://github.com/dmlc/xgboost/actions/runs/9048469681 * [Keeping your actions up to date with Dependabot](https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot) * [Configuration options for the dependabot.yml file - package-ecosystem](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem) * [doc][dask] Update notes about k8s. (#10271) * [CI] Fixes for using the latest modin. (#10285) * Release data in cache. (#10286) * Adopt new logo (#10270) * Use a thread pool for external memory. (#10288) * Fix pylint. (#10296) * Revamp the rabit implementation. (#10112) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages. * Bump conda-incubator/setup-miniconda from 2.1.1 to 3.0.4 (#10278) Bumps [conda-incubator/setup-miniconda](https://github.com/conda-incubator/setup-miniconda) from 2.1.1 to 3.0.4. - [Release notes](https://github.com/conda-incubator/setup-miniconda/releases) - [Commits](conda-incubator/setup-miniconda@v2.1.1...v3.0.4) --- updated-dependencies: - dependency-name: conda-incubator/setup-miniconda dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump ossf/scorecard-action from 2.3.1 to 2.3.3 (#10280) Bumps [ossf/scorecard-action](https://github.com/ossf/scorecard-action) from 2.3.1 to 2.3.3. - [Release notes](https://github.com/ossf/scorecard-action/releases) - [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md) - [Commits](ossf/scorecard-action@0864cf1...dc50aa9) --- updated-dependencies: - dependency-name: ossf/scorecard-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump actions/checkout from 2 to 4 (#10274) Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Commits](actions/checkout@v2...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump commons-logging:commons-logging in /jvm-packages/xgboost4j (#10294) Bumps commons-logging:commons-logging from 1.3.1 to 1.3.2. --- updated-dependencies: - dependency-name: commons-logging:commons-logging dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [CI] Bump checkout action version. (#10305) * [SYCL] Add nodes initialisation (#10269) --------- Co-authored-by: Dmitry Razdoburdin <> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump mamba-org/provision-with-micromamba from 14 to 16 (#10275) Bumps [mamba-org/provision-with-micromamba](https://github.com/mamba-org/provision-with-micromamba) from 14 to 16. - [Release notes](https://github.com/mamba-org/provision-with-micromamba/releases) - [Commits](mamba-org/provision-with-micromamba@f347426...3c96c0c) --- updated-dependencies: - dependency-name: mamba-org/provision-with-micromamba dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [JVM-packages] Prevent memory leak. (#10307) * Bump dorny/paths-filter from 2 to 3 (#10276) Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 2 to 3. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](dorny/paths-filter@v2...v3) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Bump org.apache.maven.plugins:maven-deploy-plugin (#10235) Bumps [org.apache.maven.plugins:maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin) from 3.1.1 to 3.1.2. - [Release notes](https://github.com/apache/maven-deploy-plugin/releases) - [Commits](apache/maven-deploy-plugin@maven-deploy-plugin-3.1.1...maven-deploy-plugin-3.1.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-deploy-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Add timeout for distributed tests. (#10315) * [coll] Keep the tracker alive during initialization error. (#10306) * Fix non-fed. * Fix non-fed. * macos. * macos. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: david-cortes <david.cortes.rivera@gmail.com> Co-authored-by: Dmitry Razdoburdin <dmitry.razdoburdin@intel.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Fabi <117525608+fabfabi@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Trinh Quoc Anh <trinhquocanh94@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: hcho3 <hcho3@users.noreply.github.com> Co-authored-by: Eric Leung <2754821+erictleung@users.noreply.github.com> Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: Dmitry Razdoburdin <d.razdoburdin@gmail.com>
* [pyspark] rework the log (#10077) * Add CUDA iterator to tensor view. (#10074) * Disable column sample by node for the exact tree method. (#10083) * [R] Refactor callback structure and attributes (#9957) * [sycl] add partitioning and related tests (#10080) Co-authored-by: Dmitry Razdoburdin <> * Small cleanup for mock tests. (#10085) * [CI] Test R package with CMake (#10087) * [CI] Test R package with CMake * Fix * Fix * Update test_r_package.py * Fix CMake flag for R package * Install system deps * Fix * Use sudo * [CI] Cancel GH Action job if a newer commit is published (#10088) * Optional normalization for learning to rank. (#10094) * Support graphviz plot for multi-target tree. (#10093) * [R] Rename `watchlist` -> `evals` (#10032) * [doc] Fix the default value for `lambdarank_pair_method`. (#10098) * Fix pairwise objective with NDCG metric along with custom gain. (#10100) * Fix pairwise objective with NDCG metric. - Allow setting `ndcg_exp_gain` for `rank:pairwise`. This is useful when using pairwise for objective but ndcg for metric. * [R] deprecate watchlist (#10110) * [SYCL] Add split evaluation (#10119) --------- Co-authored-by: Dmitry Razdoburdin <> * Fix compilation with the latest ctk. (#10123) * Use `std::uint64_t` for row index. (#10120) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header. * Work with IPv6 in the new tracker. (#10125) * [CI] Update scorecard actions. (#10133) * [CI] Fix yml in github action. (#10134) * add sycl reaslisation of ghist builder (#10138) Co-authored-by: Dmitry Razdoburdin <> * Cleanup set info. (#10139) - Use the array interface internally. - Deprecate `XGDMatrixSetDenseInfo`. - Deprecate `XGDMatrixSetUIntInfo`. - Move the handling of `DataType` into the deprecated C function. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Update collective implementation. (#10152) * Update collective implementation. - Cleanup resource during `Finalize` to avoid handling threads in destructor. - Calculate the size for allgather automatically. - Use simple allgather for small (smaller than the number of worker) allreduce. * [R] Make `xgb.cv` work with `xgb.DMatrix` only, adding support for survival and ranking fields (#10031) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * docs: fix bug in tutorial (#10143) * Bump org.apache.maven.plugins:maven-gpg-plugin from 3.1.0 to 3.2.2 in /jvm-packages/xgboost4j-spark (#10151) * Fix pyspark with verbosity=3. (#10172) * Fix global config for external memory. (#10173) Pass the thread-local configuration between threads. * [doc] Update python3statement URL (#10179) * [CI] Update create-pull-request action * [SYCL] Add basic features for QuantileHistMaker (#10174) --------- Co-authored-by: Dmitry Razdoburdin <> * [CI] Use latest RAPIDS; Pandas 2.0 compatibility fix (#10175) * [CI] Update RAPIDS to latest stable * [CI] Use rapidsai stable channel; fix syntax errors in Dockerfile.gpu * Don't combine astype() with loc() * Work around #10181 * Fix formatting * Fix test --------- Co-authored-by: hcho3 <hcho3@users.noreply.github.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> * docs: update Ruby package link (#10182) * [CI] Reduce clutter from dependabot (#10187) * [jvm-packages] Ombinus patch to update all minor dependencies (#10188) * Fold in #10184 * Fold in #10176 * Fold in #10168 * Fold in #10165 * Fold in #10164 * Fold in #10155 * Fold in #10062 * Fold in #9984 * Fold in #9843 * Upgrade to Maven 3.6.3 * Bump org.apache.maven.plugins:maven-jar-plugin (#10191) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.3.0 to 3.4.0. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.3.0...maven-jar-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [coll] Improve event loop. (#10199) - Add a test for blocking calls. - Do not require the queue to be empty after waking up; this frees up the thread to answer blocking calls. - Handle EOF in read. - Improve the error message in the result. Allow concatenation of multiple results. * [CI] Update machine images (#10201) * Bump org.apache.maven.plugins:maven-jar-plugin (#10202) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.3.0 to 3.4.0. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.3.0...maven-jar-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] Reuse the collective communicator. (#10198) * Bump org.scala-lang.modules:scala-collection-compat_2.12 (#10193) Bumps [org.scala-lang.modules:scala-collection-compat_2.12](https://github.com/scala/scala-collection-compat) from 2.11.0 to 2.12.0. - [Release notes](https://github.com/scala/scala-collection-compat/releases) - [Commits](scala/scala-collection-compat@v2.11.0...v2.12.0) --- updated-dependencies: - dependency-name: org.scala-lang.modules:scala-collection-compat_2.12 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump scalatest.version from 3.2.17 to 3.2.18 in /jvm-packages/xgboost4j (#10196) Bumps `scalatest.version` from 3.2.17 to 3.2.18. Updates `org.scalatest:scalatest_2.12` from 3.2.17 to 3.2.18 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.17...release-3.2.18) Updates `org.scalactic:scalactic_2.12` from 3.2.17 to 3.2.18 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.17...release-3.2.18) --- updated-dependencies: - dependency-name: org.scalatest:scalatest_2.12 dependency-type: direct:development update-type: version-update:semver-patch - dependency-name: org.scalactic:scalactic_2.12 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [coll] Add global functions. (#10203) * Bump org.apache.flink:flink-clients in /jvm-packages (#10197) Bumps [org.apache.flink:flink-clients](https://github.com/apache/flink) from 1.18.0 to 1.19.0. - [Commits](apache/flink@release-1.18.0...release-1.19.0) --- updated-dependencies: - dependency-name: org.apache.flink:flink-clients dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] support stage-level for yarn/k8s (#10209) * [coll] Implement shutdown for tracker and comm. (#10208) - Force shutdown the tracker. - Implement shutdown notice for error handling thread in comm. * [doc] Add typing to dask demos. (#10207) * [SYCL] Add sampling initialization (#10216) --------- Co-authored-by: Dmitry Razdoburdin <> * [CI] Test new setup-r. (#10228) * [CI] Use native arm64 worker in GHAction to build M1 wheel (#10225) * [CI] Use native arm64 worker in GHAction to build M1 wheel * Set up Conda * Use mamba * debug * fix * fix * fix * fix * fix * Temporarily disable other tests * Fix prefix * Use micromamba * Use conda-incubator/setup-miniconda * Use mambaforge * Fix * Fix prefix * Don't use deprecated set-output * Add verbose output from build * verbose * Specify arch * Bump setup-miniconda to v3 * Use Python 3.9 * Restore deleted files * WAR. --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump hadoop.version from 3.3.6 to 3.4.0 in /jvm-packages/xgboost4j (#10156) Bumps `hadoop.version` from 3.3.6 to 3.4.0. Updates `org.apache.hadoop:hadoop-hdfs` from 3.3.6 to 3.4.0 Updates `org.apache.hadoop:hadoop-common` from 3.3.6 to 3.4.0 --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-hdfs dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.hadoop:hadoop-common dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10217) Bumps net.alchim31.maven:scala-maven-plugin from 4.8.1 to 4.9.0. --- updated-dependencies: - dependency-name: net.alchim31.maven:scala-maven-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-jar-plugin (#10210) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.0 to 3.4.1. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.0...maven-jar-plugin-3.4.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-gpg-plugin (#10211) Bumps [org.apache.maven.plugins:maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin) from 3.2.3 to 3.2.4. - [Release notes](https://github.com/apache/maven-gpg-plugin/releases) - [Commits](apache/maven-gpg-plugin@maven-gpg-plugin-3.2.3...maven-gpg-plugin-3.2.4) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-gpg-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pyspark] Sort workers by task ID. (#10220) * Bump org.apache.spark:spark-mllib_2.12 (#10070) Bumps org.apache.spark:spark-mllib_2.12 from 3.4.1 to 3.5.1. --- updated-dependencies: - dependency-name: org.apache.spark:spark-mllib_2.12 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support more sklearn tags for testing. (#10230) * Update nvtx. (#10227) * [sycl] add data initialisation for training (#10222) Co-authored-by: Dmitry Razdoburdin <> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Fixes for numpy 2.0. (#10252) * [jvm-packagaes] Freeze spark to 3.4.1 for now. (#10253) The newer spark version for CPU conflicts with the more conservative version used by rapids. * [jvm-packages] fix group col for gpu packages (#10254) * [sycl] add loss guided hist building (#10251) Co-authored-by: Dmitry Razdoburdin <> * Be more lenient on floating point error for AUC. (#10264) * [CI] Upgrade setup-r. (#10267) * Fixes for the latest pandas. (#10266) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Keep GitHub Actions up to date with Dependabot (#10268) # Fixes software supply chain safety warnings like at the bottom right of https://github.com/dmlc/xgboost/actions/runs/9048469681 * [Keeping your actions up to date with Dependabot](https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot) * [Configuration options for the dependabot.yml file - package-ecosystem](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem) * [doc][dask] Update notes about k8s. (#10271) * [CI] Fixes for using the latest modin. (#10285) * Release data in cache. (#10286) * Adopt new logo (#10270) * Use a thread pool for external memory. (#10288) * Fix pylint. (#10296) * Revamp the rabit implementation. (#10112) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages. * Bump conda-incubator/setup-miniconda from 2.1.1 to 3.0.4 (#10278) Bumps [conda-incubator/setup-miniconda](https://github.com/conda-incubator/setup-miniconda) from 2.1.1 to 3.0.4. - [Release notes](https://github.com/conda-incubator/setup-miniconda/releases) - [Commits](conda-incubator/setup-miniconda@v2.1.1...v3.0.4) --- updated-dependencies: - dependency-name: conda-incubator/setup-miniconda dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump ossf/scorecard-action from 2.3.1 to 2.3.3 (#10280) Bumps [ossf/scorecard-action](https://github.com/ossf/scorecard-action) from 2.3.1 to 2.3.3. - [Release notes](https://github.com/ossf/scorecard-action/releases) - [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md) - [Commits](ossf/scorecard-action@0864cf1...dc50aa9) --- updated-dependencies: - dependency-name: ossf/scorecard-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump actions/checkout from 2 to 4 (#10274) Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Commits](actions/checkout@v2...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump commons-logging:commons-logging in /jvm-packages/xgboost4j (#10294) Bumps commons-logging:commons-logging from 1.3.1 to 1.3.2. --- updated-dependencies: - dependency-name: commons-logging:commons-logging dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [CI] Bump checkout action version. (#10305) * [SYCL] Add nodes initialisation (#10269) --------- Co-authored-by: Dmitry Razdoburdin <> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump mamba-org/provision-with-micromamba from 14 to 16 (#10275) Bumps [mamba-org/provision-with-micromamba](https://github.com/mamba-org/provision-with-micromamba) from 14 to 16. - [Release notes](https://github.com/mamba-org/provision-with-micromamba/releases) - [Commits](mamba-org/provision-with-micromamba@f347426...3c96c0c) --- updated-dependencies: - dependency-name: mamba-org/provision-with-micromamba dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [JVM-packages] Prevent memory leak. (#10307) * Bump dorny/paths-filter from 2 to 3 (#10276) Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 2 to 3. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](dorny/paths-filter@v2...v3) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Bump org.apache.maven.plugins:maven-deploy-plugin (#10235) Bumps [org.apache.maven.plugins:maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin) from 3.1.1 to 3.1.2. - [Release notes](https://github.com/apache/maven-deploy-plugin/releases) - [Commits](apache/maven-deploy-plugin@maven-deploy-plugin-3.1.1...maven-deploy-plugin-3.1.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-deploy-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Add timeout for distributed tests. (#10315) * [coll] Keep the tracker alive during initialization error. (#10306) * [jvm-packages] refine tracker (#10313) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [doc] Add a coarse map for XGBoost features to assist development. [skip ci] (#10310) * Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10260) Bumps net.alchim31.maven:scala-maven-plugin from 4.9.0 to 4.9.1. --- updated-dependencies: - dependency-name: net.alchim31.maven:scala-maven-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump org.apache.maven.plugins:maven-jar-plugin in /jvm-packages (#10244) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.0 to 3.4.1. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.0...maven-jar-plugin-3.4.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump org.apache.maven.plugins:maven-deploy-plugin (#10240) Bumps [org.apache.maven.plugins:maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin) from 3.1.1 to 3.1.2. - [Release notes](https://github.com/apache/maven-deploy-plugin/releases) - [Commits](apache/maven-deploy-plugin@maven-deploy-plugin-3.1.1...maven-deploy-plugin-3.1.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-deploy-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump org.codehaus.mojo:exec-maven-plugin from 3.2.0 to 3.3.0 in /jvm-packages/xgboost4j (#10309) updated-dependencies: - dependency-name: org.codehaus.mojo:exec-maven-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [R] Reshape predictions for custom eval metric when they are 2D (#10323) * [CI] add script to generate meta info and upload to s3 (#10295) * [CI] add script to generate meta info and upload to s3 * Write Python script to generate meta.json * Update other pipelines * Add wheel_name field * Add description --------- Co-authored-by: Hyunsu Cho <phcho@nvidia.com> * [sycl] optimise hist building (#10311) Co-authored-by: Dmitry Razdoburdin <> * [R] Update docs for custom user functions (#10328) * [R] Fix incorrect division of classification/ranking objectives (#10327) * [coll] Increase timeout limit. (#10332) * Bump org.sonatype.plugins:nexus-staging-maven-plugin (#10335) Bumps org.sonatype.plugins:nexus-staging-maven-plugin from 1.6.13 to 1.7.0. --- updated-dependencies: - dependency-name: org.sonatype.plugins:nexus-staging-maven-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [CI] Upgrade github workflows to use latest Conda setup action (#10320) Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Test federated plugin using GitHub action. (#10336) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * [coll] Prevent race during error check. (#10319) * [doc] Fix typo (#10340) * [R] Rename BIAS -> (Intercept) (#10337) * Handle float128 generically (#10322) * [doc] Fix typo & format in C API documentation (#10350) * [coll] Move the rabit poll helper. (#10349) * [dask] Update dask demo for using the new dask backend. (#10347) * Remove unnecessary fetch operations in external memory. (#10342) * Remove reference to R win64 MSVC build. (#10355) * Fix typo. (#10353) * Bump actions/upload-artifact from 4.3.1 to 4.3.3 (#10366) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.1 to 4.3.3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@5d5d22a...6546280) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/checkout from 4.1.1 to 4.1.6 (#10369) Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.1 to 4.1.6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@b4ffde6...a5ac7e5) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-python from 5.0.0 to 5.1.0 (#10368) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.0.0 to 5.1.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@0a5c615...82c7e63) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump the cache github action to 4.0.2. (#10377) * Bump org.apache.maven.plugins:maven-javadoc-plugin (#10373) Bumps [org.apache.maven.plugins:maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.6.3 to 3.7.0. - [Release notes](https://github.com/apache/maven-javadoc-plugin/releases) - [Commits](apache/maven-javadoc-plugin@maven-javadoc-plugin-3.6.3...maven-javadoc-plugin-3.7.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-javadoc-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-javadoc-plugin in /jvm-packages (#10360) Bumps [org.apache.maven.plugins:maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.6.3 to 3.7.0. - [Release notes](https://github.com/apache/maven-javadoc-plugin/releases) - [Commits](apache/maven-javadoc-plugin@maven-javadoc-plugin-3.6.3...maven-javadoc-plugin-3.7.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-javadoc-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.codehaus.mojo:exec-maven-plugin in /jvm-packages (#10363) Bumps [org.codehaus.mojo:exec-maven-plugin](https://github.com/mojohaus/exec-maven-plugin) from 3.2.0 to 3.3.0. - [Release notes](https://github.com/mojohaus/exec-maven-plugin/releases) - [Commits](mojohaus/exec-maven-plugin@3.2.0...3.3.0) --- updated-dependencies: - dependency-name: org.codehaus.mojo:exec-maven-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump com.nvidia:rapids-4-spark_2.12 in /jvm-packages (#10362) Bumps com.nvidia:rapids-4-spark_2.12 from 24.04.0 to 24.04.1. --- updated-dependencies: - dependency-name: com.nvidia:rapids-4-spark_2.12 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add Comet Logo to the Readme. (#10380) * [CI] Add nightly CI job to test against dev version of deps (#10351) * [CI] Add nightly CI job to test against dev version of deps * Update build-containers.sh * Add build step * Wait for build artifact * Try pinning dask * Address reviewers' comments * Fix unbound variable error * Specify dev version exactly * Pin dask=2024.1.1 * Fix logo URL [skip ci] (#10382) * Sync stream in ellpack format. (#10374) * Fix warnings in GPU dask tests. (#10358) * Bump development version to 2.2. (#10376) * [CI] Use Python 3.10 to build docs (#10383) * [jvm-packages] Don't cast to float if it's already float (#10386) * Add python 3.12 classifier. (#10381) * [Doc] Fix deployment for JVM docs (#10385) * [Doc] Fix deployment for JVM docs * Use READTHEDOCS_VERSION_NAME * Fix html * Default to master * [col] Small cleanup to federated comm. (#10397) * [SYCL] Optimize gradients calculations. (#10325) --------- Co-authored-by: Dmitry Razdoburdin <> * Fixes. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: david-cortes <david.cortes.rivera@gmail.com> Co-authored-by: Dmitry Razdoburdin <dmitry.razdoburdin@intel.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Fabi <117525608+fabfabi@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Trinh Quoc Anh <trinhquocanh94@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: hcho3 <hcho3@users.noreply.github.com> Co-authored-by: Eric Leung <2754821+erictleung@users.noreply.github.com> Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: Dmitry Razdoburdin <d.razdoburdin@gmail.com> Co-authored-by: Hyunsu Cho <phcho@nvidia.com> Co-authored-by: Astariul <43774355+astariul@users.noreply.github.com> Co-authored-by: Sid Mehta <siddharthm2350@gmail.com>
motivation
With the increasing complexity of the networking module due to support for vertical and horizontal federated learning and GPU-based training for scaling, the existing rabit module is no longer sufficient. We have multiple dimensions of features to support:
Among the above features, GPU acceleration and federated learning require loading optional external libraries. Lastly, we are trying to support resilience.
features
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
todos:
working in progress
Retry is still in progress. This is to provide essential support for handling exception (e.g., a network error or an OOM). Segfault handling has to be done with additional cooperation with the distributed framework and is out of scope for this work.
note for review
n_workers
parameter.Related
Close #8191
Close #4981
Close #4781