This repository has been archived by the owner on Apr 2, 2022. It is now read-only.
chore(deps): update dependency xgboost to v1 - autoclosed #461
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.81
->==1.4.2
Release Notes
dmlc/xgboost
v1.4.2
Compare Source
This is a patch release for Python package with following fixes:
cupy.ndarray
ininplace_predict
. https://github.com/dmlc/xgboost/pull/6933predict_leaf
is(n_samples, )
when there's only 1 tree. 1.4.0 outputs(n_samples, 1)
. https://github.com/dmlc/xgboost/pull/6889inplace_predict
. https://github.com/dmlc/xgboost/pull/6927You can verify the downloaded source code xgboost.tar.gz by running this on your unix shell:
v1.4.1
Compare Source
This is a bug fix release.
You can verify the downloaded source code
xgboost.tar.gz by
running this on your unix shell:v1.4.0
Compare Source
Introduction of pre-built binary package for R, with GPU support
Starting with release 1.4.0, users now have the option of installing
{xgboost}
withouthaving to build it from the source. This is particularly advantageous for users who want
to take advantage of the GPU algorithm (
gpu_hist
), as previously they'd have to build{xgboost}
from the source using CMake and NVCC. Now installing{xgboost}
with GPUsupport is as easy as:
R CMD INSTALL ./xgboost_r_gpu_linux.tar.gz
. (#6827)See the instructions at https://xgboost.readthedocs.io/en/latest/build.html
Improvements on prediction functions
XGBoost has many prediction types including shap value computation and inplace prediction.
In 1.4 we overhauled the underlying prediction functions for C API and Python API with an
unified interface. (#6777, #6693, #6653, #6662, #6648, #6668, #6804)
input data is supported.
dart
booster and enable GPU acceleration justlike
gbtree
.improved with
base_margin
support.strict_shape
. Seehttps://xgboost.readthedocs.io/en/latest/prediction.html for more details.
Improvement on Dask interface
Starting with 1.4, the Dask interface is considered to be feature-complete, which means
all of the models found in the single node Python interface are now supported in Dask,
including but not limited to ranking and random forest. Also, the prediction function
is significantly faster and supports shap value computation.
Dask interface. (#6471, #6591)
query ID to enable group structure. (#6576)
DaskDMatrix
(and device quantile dmatrix) now accepts all meta-information. (#6601)Prediction optimization. We enhanced and speeded up the prediction function for the
Dask interface. See the latest Dask tutorial page in our document for an overview of
how you can optimize it even further. (#6650, #6645, #6648, #6668)
Bug fixes
distributed.MultiLock
ispresent, XGBoost supports training multiple models on the same cluster in
parallel. (#6743)
dask.client
to launch async task, XGBoost might use adifferent client object internally. (#6722)
Other improvements on documents, blogs, tutorials, and demos. (#6389, #6366, #6687,
#6699, #6532, #6501)
Python package
With changes from Dask and general improvement on prediction, we have made some
enhancements on the general Python interface and IO for booster information. Starting
from 1.4, booster feature names and types can be saved into the JSON model. Also some
model attributes like
best_iteration
,best_score
are restored upon model load. Onsklearn interface, some attributes are now implemented as Python object property with
better documents.
Breaking change: All
data
parameters in prediction functions are renamed toX
for better compliance to sklearn estimator interface guidelines.
Breaking change: XGBoost used to generate some pseudo feature names with
DMatrix
when inputs like
np.ndarray
don't have column names. The procedure is removed toavoid conflict with other inputs. (#6605)
Early stopping with training continuation is now supported. (#6506)
Optional import for Dask and cuDF are now lazy. (#6522)
As mentioned in the prediction improvement summary, the sklearn interface uses inplace
prediction whenever possible. (#6718)
Booster information like feature names and feature types are now saved into the JSON
model file. (#6605)
All
DMatrix
interfaces includingDeviceQuantileDMatrix
and counterparts in Daskinterface (as mentioned in the Dask changes summary) now accept all the meta-information
like
group
andqid
in their constructor for better consistency. (#6601)Booster attributes are restored upon model load so users don't have to call
attr
manually. (#6593)
On sklearn interface, all models accept
base_margin
for evaluation datasets. (#6591)Improvements over the setup script including smaller sdist size and faster installation
if the C++ library is already built (#6611, #6694, #6565).
Bug fixes for Python package:
_estimator_type
. (#6582)JVM package
R package
ROC-AUC
We re-implemented the ROC-AUC metric in XGBoost. The new implementation supports
multi-class classification and has better support for learning to rank tasks that are not
binary. Also, it has a better-defined average on distributed environments with additional
handling for invalid datasets. (#6749, #6747, #6797)
Global configuration.
Starting from 1.4, XGBoost's Python, R and C interfaces support a new global configuration
model where users can specify some global parameters. Currently, supported parameters are
verbosity
anduse_rmm
. The latter is experimental, see rmm plugin demo andrelated README file for details. (#6414, #6656)
Other New features.
__array_interface__
. For somedata types including GPU inputs and
scipy.sparse.csr_matrix
, XGBoost employs__array_interface__
for processing the underlying data. Starting from 1.4, XGBoostcan accept arbitrary array strides (which means column-major is supported) without
making data copies, potentially reducing a significant amount of memory consumption.
Also version 3 of
__cuda_array_interface__
is now supported. (#6776, #6765, #6459,#6675)
whitespace will trigger an error. (#6769)
~
are supported.information of the trained booster. The JSON schema is updated accordingly. (#6605)
and
dart
booster support. (#6508, #6693)qid
parameter forquery groups. (#6576)
DMatrix.slice
can now consume a numpy array. (#6368)Other breaking changes
CPU Optimization
CPU implementation. (#6683, #6550, #6696, #6700)
hist
is improved. (#6410)Notable fixes in the core library
These fixes do not reside in particular language bindings:
gamma deviance metric, and better floating point guard for gamma negative log-likelihood
metric. (#6778, #6537, #6761)
gpu_hist
might generate low accuracy in previous versions. (#6755)SparsePage
exclusively to avoid some data access races. (#6590)Other deprecation notices:
This release will be the last release to support CUDA 10.0. (#6642)
Starting in the next release, the Python package will require Pip 19.3+ due to the use
of manylinux2014 tag. Also, CentOS 6, RHEL 6 and other old distributions will not be
supported.
Known issue:
MacOS build of the JVM packages doesn't support multi-threading out of the box. To enable
multi-threading with JVM packages, MacOS users will need to build the JVM packages from
the source. See https://xgboost.readthedocs.io/en/latest/jvm/index.html#installation-from-source
Doc
tree_method
parameter is added. (#6564, #6633)versionadded
(#6458)Maintenance: Testing, continuous integration
Maintenance: Refactor code for legibility and maintainability
You can verify the downloaded source code
xgboost.tar.gz
by running this on your unix shell:v1.3.3
Compare Source
best_ntree_limit
. (#6616)v1.3.2
Compare Source
best_ntree_limit
in multi-class. (https://github.com/dmlc/xgboost/pull/6569)best_ntree_limit
for linear and dart. (https://github.com/dmlc/xgboost/pull/6579)evals_result
in XGBRanker. (#https://github.com/dmlc/xgboost/pull/6594)v1.3.1
objective='binary:logitraw'
(#6517)EvaluationMonitor
(#6499)save_best
early stopping option (#6523)cupy.array_equal
, since it's not compatible with cuPy 7.8 (#6528)You can verify the downloaded source code
xgboost.tar.gz
by running this on your unix shell:v1.2.1
Compare Source
This patch release applies the following patches to 1.2.0 release:
v1.2.0
Compare Source
XGBoost4J-Spark now supports the GPU algorithm (#5171)
XGBoost now supports CUDA 11 (#5808)
Better guidance for persisting XGBoost models in an R environment (#5940, #5964)
xgb.save()
andxgb.save.raw()
instead ofsaveRDS()
. This is so that the persisted models can be accessed with future releases of XGBoost.saveRDS()
. This release adds a compatibility layer to restore access to the old RDS files. Note that this is meant to be a temporary measure; users are advised to stop usingsaveRDS()
and migrate toxgb.save()
andxgb.save.raw()
.New objectives and metrics
reg:pseudohubererror
is added (#5647). The corresponding metric ismphe
. Right now, the slope is hard-coded to 1.survival:aft
) is now accelerated on GPUs (#5714, #5716). The survival metricsaft-nloglik
andinterval-regression-accuracy
are also accelerated on GPUs.Improved integration with scikit-learn
n_features_in_
attribute to the scikit-learn interface to store the number of features used (#5780). This is useful for integrating with some scikit-learn features such asStackingClassifier
. See this link for more details.XGBoostError
now inheritsValueError
, which conforms scikit-learn's exception requirement (#5696).Improved integration with Dask
DaskDeviceQuantileDMatrix
(#5623, #5799, #5800, #5803, #5837, #5874, #5901): Previously, the Dask interface had to make 2 data copies: one for concatenating the Dask partition/block into a single block and another for internal representation. To save memory, we introduceDaskDeviceQuantileDMatrix
. As long as Dask partitions are resident in the GPU memory,DaskDeviceQuantileDMatrix
is able to ingest them directly without making copies. This matrix type wrapsDeviceQuantileDMatrix
.Robust handling of external data types (#5689, #5893)
Improvements in GPU-side data matrix (
DeviceQuantileDMatrix
)New language binding: Swift (#5728)
Robust model serialization with JSON (#5772, #5804, #5831, #5857, #5934)
Performance improvements
single_precision_histogram
to use 32 bit histogram instead for faster training performance. (#5624, #5811)API additions
XGBoosterGetNumFeature
is added for getting number of features in booster (#5856).Breaking: The
predict()
method ofDaskXGBClassifier
now produces class predictions (#5986). Usepredict_proba()
to obtain probability predictions.DaskXGBClassifier.predict()
produced probability predictions. This is inconsistent with the behavior of other scikit-learn classifiers, wherepredict()
returns class predictions. We make a breaking change in 1.2.0 release so thatDaskXGBClassifier.predict()
now correctly produces class predictions and thus behave like other scikit-learn classifiers. Furthermore, we introduce thepredict_proba()
method for obtaining probability predictions, again to be in line with other scikit-learn classifiers.Breaking: Custom evaluation metric now receives raw prediction (#5954)
Breaking: XGBoost4J-Spark now requires Spark 3.0 and Scala 2.12 (#5836, #5890)
Breaking: XGBoost Python package now requires Python 3.6 and later (#5715)
Breaking: XGBoost now adopts the C++14 standard (#5664)
Bug-fixes
IsDense
(#5702)setAllowZeroForMissingValue
(#5740)Usability Improvements, Documentation
raise from
syntax to preserve full stacktrace (#5787).dump_model()
function from breaking. See this document to understand the difference between saving and dumping models.max.depth
in the R gblinear example. (#5753)silent
parameter from R demos. (#5675)n_estimators
in the docstring of the scikit-learn interface (#6041)Maintenance: testing, continuous integration, build system
hypothesis
package for testing (#5759, #5835, #5849)._CRT_SECURE_NO_WARNINGS
to remove unneeded warnings in MSVC (#5434)Maintenance: Refactor code for legibility and maintainability
gpu_hist
split evaluation in preparation for batched nodes enumeration. (#5610)c_api.h
in header files. (#5782)Empty
method for host device vector. (#5781)Acknowledgement
Contributors: Nan Zhu (@CodingCat), @LionOrCatThatIsTheQuestion, Dmitry Mottl (@Mottl), Rory Mitchell (@RAMitchell), @ShvetsKS, Alex Wozniakowski (@a-wozniakowski), Alexander Gugel (@alexanderGugel), @anttisaukko, @boxdot, Andy Adinets (@canonizer), Ram Rachum (@cool-RR), Elliot Hershberg (@elliothershberg), Jason E. Aten, Ph.D. (@glycerine), Philip Hyunsu Cho (@hcho3), @jameskrach, James Lamb (@jameslamb), James Bourbeau (@jrbourbeau), Peter Jung (@kongzii), Lorenz Walthert (@lorenzwalthert), Oleksandr Kuvshynov (@okuvshynov), Rong Ou (@rongou), Shaochen Shi (@shishaochen), Yuan Tang (@terrytangyuan), Jiaming Yuan (@trivialfis), Bobby Wang (@wbo4958), Zhang Zhang (@zhangzhang10)
Reviewers: Nan Zhu (@CodingCat), @LionOrCatThatIsTheQuestion, Hao Yang (@QuantHao), Rory Mitchell (@RAMitchell), @ShvetsKS, Egor Smirnov (@SmirnovEgorRu), Alex Wozniakowski (@a-wozniakowski), Amit Kumar (@aktech), Avinash Barnwal (@avinashbarnwal), @boxdot, Andy Adinets (@canonizer), Chandra Shekhar Reddy (@chandrureddy), Ram Rachum (@cool-RR), Cristiano Goncalves (@cristianogoncalves), Elliot Hershberg (@elliothershberg), Jason E. Aten, Ph.D. (@glycerine), Philip Hyunsu Cho (@hcho3), Tong He (@hetong007), James Lamb (@jameslamb), James Bourbeau (@jrbourbeau), Lee Drake (@leedrake5), DougM (@mengdong), Oleksandr Kuvshynov (@okuvshynov), RongOu (@rongou), Shaochen Shi (@shishaochen), Xu Xiao (@sperlingxx), Yuan Tang (@terrytangyuan), Theodore Vasiloudis (@thvasilo), Jiaming Yuan (@trivialfis), Bobby Wang (@wbo4958), Zhang Zhang (@zhangzhang10)
v1.1.1
Compare Source
This patch release applies the following patches to 1.1.0 release:
v1.1.0
Compare Source
Better performance on multi-core CPUs (#5244, #5334, #5522)
hist
algorithm for multi-core CPUs has been under investigation (#3810). #5244 concludes the ongoing effort to improve performance scaling on multi-CPUs, in particular Intel CPUs. Roadmap: #5104hist
tree method on CPU.Deterministic GPU algorithm for regression and classification (#5361)
Improve external memory support on GPUs (#5093, #5365)
Parameter validation: detection of unused or incorrect parameters (#5477, #5569, #5508)
Thread-safe, in-place prediction method (#5389, #5512)
inplace_predict()
that is thread-safe. It is now possible to serve concurrent requests for prediction using a shared model object.numpy.ndarray
/scipy.sparse.csr_matrix
/cupy.ndarray
/cudf.DataFrame
/pd.DataFrame
) without creating aDMatrix
object.Addition of Accelerated Failure Time objective for survival analysis (#4763, #5473, #5486, #5552, #5553)
survival:aft
to support survival analysis. Also added is the new API to specify the ranged labels. Check out the tutorial and the demos.Improved installation experience on Mac OSX (#5597, #5602, #5606, #5701)
brew install libomp
followed bypip install xgboost
. The installed XGBoost will use all CPU cores. Even better, starting with this release, we distribute pre-compiled binary wheels targeting Mac OSX. Now the install commandpip install xgboost
finishes instantly, as it no longer compiles the C++ source of XGBoost. The last three Mac versions (High Sierra, Mojave, Catalina) are supported.Initializing libomp.dylib, but found libomp.dylib already initialized
(#5701)Ranking metrics are now accelerated on GPUs (#5380, #5387, #5398)
GPU-side data matrix to ingest data directly from other GPU libraries (#5420, #5465)
DeviceQuantileDMatrix
) so that it can ingest data from GPU memory directly. The result is that XGBoost interoperates better with GPU-accelerated data science libraries, such as cuDF, cuPy, and PyTorch.Robust model serialization with JSON (#5123, #5217)
Booster
) object in R as a JSON string (#5123, #5217).Improved integration with Dask
verbose
parameter for dask fit (#5413)DMLC_TASK_ID
. (#5415)nthreads
from dask worker. (#5414)XGBoost4J-Spark: Check number of columns in the data iterator (#5202, #5303)
Major refactoring of the
DMatrix
classBreaking: XGBoost Python package now requires Pip 19.0 and higher (#5589)
manylinux2010
tag in the binary wheel release. Ensure you have Pip 19.0 orConfiguration
📅 Schedule: At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by WhiteSource Renovate. View repository job log here.