Pull updates from dmlc/xgboost #1

nyoko · 2020-08-12T11:31:44Z

No description provided.

* Returns a series when input is dataframe. * Merge assert client.

Thank you a lot. Good catch!

@RAMitchell

* [WIP] Add lower and upper bounds on the label for survival analysis * Update test MetaInfo.SaveLoadBinary to account for extra two fields * Don't clear qids_ for version 2 of MetaInfo * Add SetInfo() and GetInfo() method for lower and upper bounds * changes to aft * Add parameter class for AFT; use enum's to represent distribution and event type * Add AFT metric * changes to neg grad to grad * changes to binomial loss * changes to overflow * changes to eps * changes to code refactoring * changes to code refactoring * changes to code refactoring * Re-factor survival analysis * Remove aft namespace * Move function bodies out of AFTNormal and AFTLogistic, to reduce clutter * Move function bodies out of AFTLoss, to reduce clutter * Use smart pointer to store AFTDistribution and AFTLoss * Rename AFTNoiseDistribution enum to AFTDistributionType for clarity The enum class was not a distribution itself but a distribution type * Add AFTDistribution::Create() method for convenience * changes to extreme distribution * changes to extreme distribution * changes to extreme * changes to extreme distribution * changes to left censored * deleted cout * changes to x,mu and sd and code refactoring * changes to print * changes to hessian formula in censored and uncensored * changes to variable names and pow * changes to Logistic Pdf * changes to parameter * Expose lower and upper bound labels to R package * Use example weights; normalize log likelihood metric * changes to CHECK * changes to logistic hessian to standard formula * changes to logistic formula * Comply with coding style guideline * Revert back Rabit submodule * Revert dmlc-core submodule * Comply with coding style guideline (clang-tidy) * Fix an error in AFTLoss::Gradient() * Add missing files to amalgamation * Address @RAMitchell's comment: minimize future change in MetaInfo interface * Fix lint * Fix compilation error on 32-bit target, when size_t == bst_uint * Allocate sufficient memory to hold extra label info * Use OpenMP to speed up * Fix compilation on Windows * Address reviewer's feedback * Add unit tests for probability distributions * Make Metric subclass of Configurable * Address reviewer's feedback: Configure() AFT metric * Add a dummy test for AFT metric configuration * Complete AFT configuration test; remove debugging print * Rename AFT parameters * Clarify test comment * Add a dummy test for AFT loss for uncensored case * Fix a bug in AFT loss for uncensored labels * Complete unit test for AFT loss metric * Simplify unit tests for AFT metric * Add unit test to verify aggregate output from AFT metric * Use EXPECT_* instead of ASSERT_*, so that we run all unit tests * Use aft_loss_param when serializing AFTObj This is to be consistent with AFT metric * Add unit tests for AFT Objective * Fix OpenMP bug; clarify semantics for shared variables used in OpenMP loops * Add comments * Remove AFT prefix from probability distribution; put probability distribution in separate source file * Add comments * Define kPI and kEulerMascheroni in probability_distribution.h * Add probability_distribution.cc to amalgamation * Remove unnecessary diff * Address reviewer's feedback: define variables where they're used * Eliminate all INFs and NANs from AFT loss and gradient * Add demo * Add tutorial * Fix lint * Use 'survival:aft' to be consistent with 'survival:cox' * Move sample data to demo/data * Add visual demo with 1D toy data * Add Python tests Co-authored-by: Philip Cho <chohyu01@cs.washington.edu>

* Install dependencies by pip.

Normal prediction with DMatrix is now thread safe with locks. Added inplace prediction is lock free thread safe. When data is on device (cupy, cudf), the returned data is also on device. * Implementation for numpy, csr, cudf and cupy. * Implementation for dask. * Remove sync in simple dmatrix.

…5465)

* Copy dmlc travis script to XGBoost.

* Set default dtor for SimpleDMatrix to initialize default copy ctor, which is deleted due to unique ptr. * Remove commented code. * Remove warning for calling host function (std::max). * Remove warning for initialization order. * Remove warning for unused variables.

* Robust regularization of AFT gradient and hessian * Fix AFT doc; expose it to tutorial TOC * Apply robust regularization to uncensored case too * Revise unit test slightly * Fix lint * Update test_survival.py * Use GradientPairPrecise * Remove unused variables

* Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>

* Fix issue when scikit learn interface receives transformed inputs.

…5853) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>

* Add imports to code snippet * Better writing.

* Add CMake flag to log C API invocations, to aid debugging * Remove unnecessary parentheses

* [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target

* [R] Provide better guidance for persisting XGBoost model * Update saving_model.rst * Add a paragraph about xgb.serialize()

…5740) * Allow non-zero for missing value when training. * Fix wrong method names. * Add a unit test * Move the getter/setter unit test to MissingValueHandlingSuite Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>

* Update JSON schema for pseudo huber. * Update JSON model schema.

* Fix data warning. * Add numpy/scipy test.

* Make JSON model IO more future proof by using tree id in model loading.

…ting oneAPI programming model (#5825) * Added plugin with DPC++-based predictor and objective function * Update CMakeLists.txt * Update regression_obj_oneapi.cc * Added README.md for OneAPI plugin * Added OneAPI predictor support to gbtree * Update README.md * Merged kernels in gradient computation. Enabled multiple loss functions with DPC++ backend * Aligned plugin CMake files with latest master changes. Fixed whitespace typos * Removed debug output * [CI] Make oneapi_plugin a CMake target * Added tests for OneAPI plugin for predictor and obj. functions * Temporarily switched to default selector for device dispacthing in OneAPI plugin to enable execution in environments without gpus * Updated readme file. * Fixed USM usage in predictor * Removed workaround with explicit templated names for DPC++ kernels * Fixed warnings in plugin tests * Fix CMake build of gtest Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>

* Remove parameter on JVM Packages.

…dask.DaskXGBClassifier (#5986)

* Fix nightly build doc. [skip ci] * Fix title too short. [skip ci]

* [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021f. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>

trivialfis and others added 30 commits March 19, 2020 17:05

[dask] Accept other inputs for prediction. (#5428)

760d5d0

* Returns a series when input is dataframe. * Merge assert client.

[R-package] changed FindLibR to take advantage of CMake cache (#5427)

3cf665d

Support pandas SparseArray. (#5431)

abca990

[R-package] fixed uses of class() (#5426)

4b7e2b7

Thank you a lot. Good catch!

[dask] Fix missing value for scikit-learn interface. (#5435)

cd7d6f7

Ranking metric acceleration on the gpu (#5398)

d2231fc

Add link to GPU documentation (#5437)

1de36cd

Force compressed buffer to be 4 bytes aligned. (#5441)

7146b91

Refactor tests with data generator. (#5439)

4942da6

Resolve travis failure. (#5445)

780de49

* Install dependencies by pip.

Device dmatrix (#5420)

13b10a6

Reducing memory consumption for 'hist' method on CPU (#5334)

27a8e36

[R-package] fixed inconsistency in R -e calls in FindLibR.cmake (#5438)

7f980e9

Add support for dlpack, expose python docs for DeviceQuantileDMatrix (#…

15f40e5

…5465)

Reduce span check overhead. (#5464)

babcb99

Update dmlc-core. (#5466)

e86030c

* Copy dmlc travis script to XGBoost.

Remove silent parameter. (#5476)

d0b86c7

Enable parameter validation for skl. (#5477)

c218d8f

Split up test helpers header. (#5455)

459b175

Implement host span. (#5459)

86beb68

Accept other gradient types for split entry. (#5467)

9399736

Fix dump model. (#5485)

a931380

Small updates to GPU documentation (#5483)

1580010

Add R code to AFT tutorial [skip ci] (#5486)

30e94dd

Upgrade clang-tidy on CI. (#5469)

0012f2e

* Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>

corrected spelling of 'list' (#5482)

c362125

trivialfis and others added 29 commits July 29, 2020 19:26

Disable feature validation on sklearn predict prob. (#5953)

f5fdcbe

* Fix issue when scikit learn interface receives transformed inputs.

[CI] Fix broken Docker container 'cpu' (#5956)

071e10c

Fix evaluate root split. (#5948)

e4a273e

[Dask] Asyncio support. (#5862)

fa3715f

Thread-safe prediction by making the prediction cache thread-local. (#…

d268a2a

…5853) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>

Force colored output for ninja build. (#5959)

70903c8

Update XGBoost + Dask overview documentation (#5961)

3b88bc9

* Add imports to code snippet * Better writing.

Add CMake flag to log C API invocations, to aid debugging (#5925)

3fcfaad

* Add CMake flag to log C API invocations, to aid debugging * Remove unnecessary parentheses

[CI] Assign larger /dev/shm to NCCL (#5966)

5f3c811

* [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target

Add missing Pytest marks to AsyncIO unit test (#5968)

bf2990e

[R] Provide better guidance for persisting XGBoost model (#5964)

5a2dcd1

* [R] Provide better guidance for persisting XGBoost model * Update saving_model.rst * Add a paragraph about xgb.serialize()

Export DaskDeviceQuantileDMatrix in doc. [skip ci] (#5975)

b069431

Fix sklearn doc. (#5980)

1149a7a

Update Python custom objective demo. (#5981)

9c93531

Update JSON schema. (#5982)

8599f87

* Update JSON schema for pseudo huber. * Update JSON model schema.

Fix missing data warning. (#5969)

dde9c5a

* Fix data warning. * Add numpy/scipy test.

Enforce tree order in JSON. (#5974)

9c6e791

* Make JSON model IO more future proof by using tree id in model loading.

Fix dask predict shape infer. (#5989)

801e6b6

[R] fix uses of 1:length(x) and other small things (#5992)

589b385

Fix typo in tracker logging (#5994)

7cf3e9b

Remove skmaker. (#5971)

0b2a26f

Rabit update. (#5978)

f93f1c0

* Remove parameter on JVM Packages.

Move warning about empty dataset. (#5998)

6f7112a

[Breaking] Fix .predict() method and add .predict_proba() in xgboost.…

bd6b7f4

…dask.DaskXGBClassifier (#5986)

Unify CPU hist sketching (#5880)

ee70a23

Fix nightly build doc. [skip ci] (#6004)

c3ea3b7

* Fix nightly build doc. [skip ci] * Fix title too short. [skip ci]

nyoko merged commit b4e7736 into nyoko:master Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull updates from dmlc/xgboost #1

Pull updates from dmlc/xgboost #1

nyoko commented Aug 12, 2020

Pull updates from dmlc/xgboost #1

Pull updates from dmlc/xgboost #1

Conversation

nyoko commented Aug 12, 2020