dmlc · trivialfis · Apr 16, 2022 · Mar 22, 2022 · Mar 22, 2022 · Mar 23, 2022
diff --git a/NEWS.md b/NEWS.md
@@ -3,6 +3,237 @@ XGBoost Change Log
 
 This file records the changes in xgboost library in reverse chronological order.
 
+## v1.6.0 (2022 Mar 24)
+
+After a long period of development, XGBoost v1.6.0 is heavy with many new features and
+improvements. We summarize them in the following sections starting with an introduction to
+some significant new features, then move onto language binding specific changes including
+new features and notable bug fixes for that binding.
+
+### Development on categorical data support
+This version of XGBoost features new improvements and full coverage of experimental
+categorical data support in Python and C package with tree model.  Both `hist`, `approx`
+and `gpu_hist` now supports training with categorical data.  Also, partition-based
+categorical split is featured in this release. This feature is first available in LightGBM
+in the context of gradient boosting. In previous version, only `gpu_hist` supports one-hot
+encoding based split which has the form of `x \in {c}` where `{c}` is the set of all
+categories. In this new release the `{c}` can be split into 2 sets for the left and right
+nodes using any of the aforementioned tree methods. For more details, please see our
+tutorial on [categorical data](https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html):
+along with examples linked in that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705,
+#7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371)
+
+In the future, we will continue to improve categorical data support with new features and
+optimizations. Also, we might look forward to bring the feature beyond Python binding,
+contributions and feedback are welcomed! Lastly, as a result of experimental status,
+behavior might be subject to change, especially the default value of related
+hyper-parameters.
+
+### Experimental support for multi-output model
+
+XGBoost 1.6 features an initial support for the multi-output model, which includes
+multi-output regression and multi-label classification. Along with which, XGBoost
+classifier has proper support for base-margin without to need for user to flatten the
+input. Right now, XGBoost builds one model for each target similar to sklearn meta
+estimator, for more details, please see our
+[quick introduction](https://xgboost.readthedocs.io/en/latest/tutorials/multioutput.html).
+
+(#7365, #7736, #7607, #7574, #7521, #7514, #7456, #7453, #7455, #7434, #7429, #7405, #7381)
+
+### External memory support
+External memory support for both approx and hist tree method is considered feature
+complete in XGBoost 1.6.  Building upon the iterator-based interface introduced in
+previous version, now both `hist` and `approx` iterates over each batch of data during
+training and prediction.  In previous versions, `hist` concatenates all the batches into
+an internal representation, which is removed in this version.  As a result, user can
+expect higher scalability in terms of data size but might experience lower performance due
+to disk IO. (#7531, #7320, #7638, #7372)
+
+### Rewritten approx
+
+The `approx` tree method is rewritten based on the existing `hist` tree method, the
+rewrite closes the feature gap between `approx` and `hist` and improves the performance.
+Now the behavior and `approx` should be more aligned with `hist` and `gpu_hist`, here's a
+list of user visible changes:
+
+- Supports both `max_leaves` and `max_depth`.
+- Supports `grow_policy`.
+- Supports monotonic constraint.
+- Supports feature weights.
+- Use `max_bin` to replace `sketch_eps`.
+- Supports for categorical data.
+- Faster performance for many of the datasets.
+- Improved performance and robustness for distributed training.
+- Supports prediction cache.
+- Significantly better performance for external memory.
+- Unites the code base between approx and hist.
+
+### New serialization format
+Based on the existing JSON serialization format, we introduced UBJSON support as a more
+efficient alternative. Both formats will be available in the future and we plan to
+gradually [phase out](https://github.com/dmlc/xgboost/issues/7547) support for the old
+binary model format.  Users can opt to use the different formats in serialization function
+by providing the file extension `json` or `ubj`. Also, the `save_raw` function in all
+supported languages bindings gain a new parameter for exporting model in different
+formats, available options are `json`, `ubj` and `deprecated`, see document for the
+language binding you are using for details. Lastly, default internal serialization format
+is set to UBJSON, which affects Python pickle and R RDS. (#7572, #7570, #7358, #7571,
+#7556, #7549, #7416)
+
+### General new features
+Other than the major new features mentioned above, some others are summarized here:
+
+* Users can now access the build information of XGBoost binary in Python and C
+  interface. (#7399, #7553)
+* Remove auto configuration of `seed_per_iteration`, now distributed training should
+  generate closer result to single node training when sampling is used. (#7009)
+* A new parameter `huber_slope` is introduced for the `Pseudo-Huber` objective.
+* During source build, XGBoost can choose cub in system path automatically. (#7579)
+* XGBoost now honors the CPU counts from CFS, which is usually set in docker
+  environment. (#7654, #7704)
+* The metric `aucpr` is rewritten for better performance and GPU support. (#7297, #7368)
+* Metric calculation is now performed in double precision. (#7364)
+* XGBoost no longer mutate the global OpenMP thread limit. (#7537, #7519, #7608, #7590, #7589, #7588)
+* The default behavior of `max_leave` and `max_depth` is now unified (#7302, #7551).
+* CUDA fat binary is now compressed. (#7601)
+* Use double for GPU Hist node sum, which improves the accuracy of `gpu_hist`. (#7507)
+
+### Deterministic result for evaluation metric and linear model
+In previous versions of XGBoost, evaluation result might differ slightly for each run due
+to parallel reduction for floating point values, which is now addressed. (#7362, #7303,
+#7316, #7349)
+
+### Performance improvements
+Most of the performance improvements are integrated into other refactors during feature
+developments. The `approx` should see significant performance gain for many datasets as
+mentioned in previous section, while the `hist` tree method also enjoys improved
+performance with the removal of the internal `pruner` along with some other
+refactoring. Lastly, `gpu_hist` no longer synchronize the device during training. (#7737)
+
+### General bug fixes
+* Fixes in CMake script for exporting configuration. (#7730)
+* XGBoost can now handle unsorted sparse input. This includes text file format like libsvm
+  and scipy sparse matrix where column index is not sorted. (#7731)
+* Fix tree param feature type, this affects inputs with number of columns greater than the
+maximum value of int32. (#7565)
+* Fix external memory with gpu_hist and subsampling. (#7481)
+* Check number of trees in inplace predict, this avoids a potential segfault when an
+  incorrect value for `iteration_range` is provided. (#7409)
+
+### Changes in the Python package
+Other than the changes in Dask, the XGBoost Python package gained some new features and
+improvements along with small bug fixes.
+
+* Python 3.7 is required as the lowest Python version. (#7682)
+* Binary package Support Apple Silicon. (#7621, #7612)
+* There are new parameters for users to specify the custom metric with new
+  behavior. XGBoost can now output transformed prediction value when custom objective is
+  not supplied.  See our explanation in
+  [tutorial](https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html#reverse-link-function)
+  for details.
+* For sklearn interface, following the estimator guideline from scikit-learn, all
+  parameters in `fit` that are not related to input data are moved into constructor and
+  can be set by `set_params`. (#6751, #7420, #7375, #7369)
+* A new function `get_group` is introduced for `DMatrix` to allow users get the group
+  information in custom objective. (#7564)
+* More training parameters are exposed in sklearn interface instead of relying on the `**kwargs`. (#7629)
+* A new attribute `feature_names_in_` is defined for all sklearn estimators like
+  `XGBRegressor` to follow the convention of sklearn. (#7526)
+* More work on Python type hint. (#7432, #7348, #7338, #7513)
+* Support latest pandas Index type. (#7595)
+* Fix for Feature shape mismatch error on s390x platform (#7715)
+* Fix using feature names for constraints with multiple groups (#7711)
+* We clarified the behavior of callback function when it contains mutable states. (#7685)
+* Lastly, there are some code cleanups and maintenance work. (#7585, #7426, #7634, #7665, #7667, #7377, #7360, #7498, #7438, #7667)
+
+### Changes in Dask interface
+* Dask module now supports user-supplied host IP and port address of scheduler node.
+  Please see [introduction](https://xgboost.readthedocs.io/en/latest/tutorials/dask.html#troubleshooting) and
+  [API document](https://xgboost.readthedocs.io/en/latest/python/python_api.html#optional-dask-configuration)
+  for reference. (#7645, #7581)
+* Internal `DMatrix` construction in dask now honers thread configuration. (#7337)
+* A fix for `nthread` configuration using the Dask sklearn interface. (#7633)
+* Apache arrow format is now supported, which can bring better performance to users'
+  pipeline (#7512)
+* The Dask interface can now handle empty partition.  Empty partition is different from
+  empty worker, the later refers the to case when a worker has no partition of a input
+  dataset, while the former refers to some partitions on a worker has zero size. (#7644,
+  #7510)
+* Scipy sparse matrix is supported as Dask array partition. (#7457)
+* Dask interface is no longer considered experimental. (#7509)
+
+### Changes in R package
+This section summaries the new features, improvements and bug fixes to the R package.
+
+* `load.raw` can optionally construct a booster as return. (#7686)
+* Fix parsing decision stump, which affects both transforming text representation to data
+  table and plotting. (#7689)
+* Implement feature weights. (#7660)
+* Some improvements for complying the CRAN release policy. (#7672, #7661)
+* Support CSR data for predictions (#7615)
+* Document update (#7263, #7606)
+* New maintainer for the CRAN package (#7691, #7649)
+
+### JVM-packages
+Some new features for JVM-packages are introduced for more integrated GPU pipeline and
+better compatibility with musl-based Linux. Aside from which, we have a few notable bug
+fixes.
+
+* Add support for detecting musl-based Linux (#7624)
+* Add `DeviceQuantileDMatrix` to Scala binding (#7459)
+* Add Rapids plugin support (#7491)
+* The setters for CPU and GPU are more aligned (#7692)
+* Control logging for early stopping (#7326)
+* Do not repartition when nWorker = 1 (#7676)
+* Fix the prediction issue for `multi:softmax` (#7694)
+* Fix for serialization of custom objective and eval (#7274)
+* Update documentation about Python tracker (#7396)
+* Some refactoring to the training pipeline for better compatibility between CPU and
+  GPU. (#7440, #7401)
+* Maintenance. (#7550, #7335, #7641, #7523)
+
+### Deprecation
+Other than the changes in Python package and serialization, we removed some deprecated
+feature in previous releases. Also, as mentioned in the previous section, we plan to phase
+out the old binary format in future releases.
+
+* Remove old warning in 1.3 (#7279)
+* Remove label encoder deprecated in 1.3. (#7357)
+* Remove old callback deprecated in 1.3. (#7280)
+
+### Maintenance
+This is a brief summary of maintenance work that are not specific to any language binding.
+
+* Add CMake option to use /MD runtime (#7277)
+* Add clang-format config. (#7383)
+* Code cleanups (#7539, #7536, #7466, #7499, #7533, #7735, #7722, #7668, #7304, #7293,
+  #7321, #7356, #7345, #7387, #7577, #7548, #7469, #7680, #7433, #7398)
+* Improved tests with better converge and latest dependency (#7573, #7446, #7650, #7520,
+  #7373, #7723, #7611)
+* Improved automation of the release process. (#7278, #7332, #7470)
+* Compiler workarounds (#7673)
+* Change shebang used in CLI demo. (#7389)
+* Update affiliation (#7289)
+
+
+### Documentation
+This section lists some of the general change in document, for language binding specific
+change please visit related sections.
+
+* Document is overhauled to use the new rtd theme, along with integration of Python
+  examples. Also, we replaced most of the hardcoded URLs with sphinx references. (#7347,
+  #7346, #7468, #7522, #7530)
+* Small update along with some fixes for broken links, typos, etc. (#7684, #7324, #7334,
+  #7655, #7628, #7623, #7487, #7532, #7500, #7341, #7648, #7311)
+* Update document for GPU. [skip ci] (#7403)
+* Document the status of RTD hosting. (#7353)
+* Update document for building from source. (#7664)
+* Add note about CRAN release [skip ci] (#7395)
+
+### CI
+Some fixes and update to XGBoost's CI infrastructure. (#7739, #7701, #7382, #7662, #7646,
+#7582, #7407, #7417, #7475, #7474, #7479, #7472, #7626)
+
 ## v1.5.0 (2021 Oct 11)
 
 This release comes with many exciting new features and optimizations, along with some bug