-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Redesigned xgboost()
interface skeleton
#10456
Conversation
Thank you for sharing the early work. I will look into the new example and introduction for learning about the new interface. |
To clarify: there's no intro and only a very small example, since there's no specialized |
Would you like me to add a feature branch in xgboost for you to iterate on? |
Thanks but I don't think there's any need for that. Or are you planning a 2.1.1 release soon? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @david-cortes !
The auto-selection of the objective is really neat.
While reading the code, I added some detail stuff (not that they are important, it was just to keep me busy).
Some additional comments:
- I'd prefer helper functions like
process.x.and.col.args()
to be named like_process_x_and_col_args()
. - We will also need to deal with
X
(store names of features, store factor levels of character/factor columns).
if (!NROW(base_margin)) { | ||
return(NULL) | ||
} | ||
if (is.array(base_margin) && length(dim(base_margin)) > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (is.array(base_margin) && length(dim(base_margin)) > 2) { | |
if (is.array(base_margin) && length(dim(base_margin)) > 2L) { |
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
R-package/R/xgboost.R
Outdated
|
||
params_function_args <- c( | ||
"objective", "verbose", "verbosity", "nthread", "seed", | ||
"monotone_constraints", "interaction_constraints" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any issue with these parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are meant to be passed as arguments to the function xgboost()
instead of being passed through params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monotone_constraints
and interaction_constraints
might be better as a part of normal params? This way it feels more consistent. It's difficult from a user's perspective to understand why some of these parameters are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective:
- Some parameters I chose to put as function arguments in order to give them a default value visible to the user (e.g.
nthreads
) or make the default value change according to the data (e.g.objective
). - Some parameters are independent of the data (e.g. regularization) or work regardless of whether the data supports them or not (e.g. max categ. to one-hot), while others refer specifically to columns in the data (e.g. interaction constraints) and would not work with a different dataset.
- Ideally, later on there should be a function like
xgb.control
or similar that would list the parameters and offer auto-completion by having them as function arguments (e.g. likeVGAM::vglm.control
orglmx::glmx.control
). Those train-control parameters typically involve things that are independent of the data. - Alternatively, and as is more common in R, all parameters could be moved to function arguments (e.g. like
ranger::ranger
orglmnet::glmnet
), but perhaps that would make the docs very hard to use.
It's difficult from a user's perspective to understand why some of these parameters are different.
Yes, it's a bit arbitrary, but that is also the situation in the other interfaces - e.g. the number of boosting rounds is a function argument, as is verbose_eval
in the python interface. It also feels like parameters such as seed
should be in the train control list, but that would have other side effects due to the logic introduced in an earlier PR (#10029).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for v3 you could consider having a division between data-dependent parameters (e.g. feature_weights
is passed under DMatrix
, while monotone_contraints
is passed under params
) and data-independent parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation. Please think of me as a newcomer instead of a maintainer. Or think of the future GitHub issues we will need to reply to when someone puts the parameter in the wrong place and asks why.
but that is also the situation in the other interfaces
I think that's only true for the low-level interface, and some of them are only in this way because we think of it as an unnecessary breaking change. For higher-level interfaces like the scikit-learn, or Spark, they have a relatively consistent interface. For the scikit-learn, we are moving the parameters (slowly, can find some examples like early_stoppping_rounds
and metric
) according to the scikit-learn estimator guideline where the constructor takes hyper-parameter and sometimes feature-related data like feature types, while the fit
method takes data(sample)-dependent parameters that can be split for early stopping, etc.
default value visible to the user
We can compromise on this, maybe the document is sufficient?
Alternatively, and as is more common in R, all parameters could be moved to function arguments
That sounds reasonable to me. Personally, I prefer consistency, which has been a pain point for XGBoost for a while now. Would be great to see improvement in this area.
Perhaps for v3 you could consider having a division between data-dependent parameters (e
Hmm, since this is a new interface already, maybe it's better to do it right the first time if we know where the issue is? The internal implementation can be changed anytime we need, but changing the interface is particularly difficult, especially with the CRAN policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then, I'll move all parameters to function arguments, leaving them as undocumented ...
for now (to be changed in a later PR). We'll see how the docs and examples end up looking later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified it by removing params
and making all parameters as function arguments, by using ...
for now.
) | ||
} else if (y_attr$type == "interval") { | ||
out$dmatrix_args <- list( | ||
label_lower_bound = ifelse(y[, 3L] == 2, 0, y[, 1L]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some comments for these indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment block above.
} | ||
|
||
check.nthreads <- function(nthreads) { | ||
if (is.null(nthreads)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to make single thread the default for R xgboost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the default is to use all threads, as obtained when calling parallel::detectCores()
.
R-package/R/xgboost.R
Outdated
if (length(nthreads) > 1L) { | ||
nthreads <- head(nthreads, 1L) | ||
} | ||
if (is.na(nthreads) || nthreads < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XGBoost takes 0 as using all available cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the condition and added it to the docs.
R-package/R/xgboost.R
Outdated
} | ||
|
||
check.can.use.qdm <- function(x, params) { | ||
if (inherits(x, "sparseMatrix") && !inherits(x, "dgRMatrix")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support COO internally, for CSC, maybe we can simply transpose in R. Do you think it can help the make code less complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but what would be the benefit in such case if the data still needs to be duplicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just remembered that constructing the simple DMatrix copies the data, so yes there would be one less copy in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified it to accept COO and CSC for QDM, with a casting for now until direct QDM construction from COO is added.
if (!inherits(x, supported_x_types)) { | ||
stop( | ||
"'x' must be one of the following classes: ", | ||
paste(supported_x_types, collapse = ", "), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about data.table
and potentially other libraries that inherit from these known types in this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because all data.table
objects are also data.frame
s.
class(data.table())
[1] "data.table" "data.frame"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware of this, hence the question, is it possible that this error message might lead the user to think data.frame
is not supported? Or it's common knowledge in the R world?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because if class(x)
contains data.frame
as an entry, inherits
will return TRUE
, so there will be no error message in a derived class like data.table
or tibble
. And if it is a class like arrow
or polars DataFrame
which do not inherit from data.frame
, then the error will show as expected, since they are not supported.
Thank you for the excellent work, will merge once the CI passes. |
* [coll] Allow using local host for testing. (#10526) - Don't try to retrieve the IP address if a host is specified. - Fix compiler deprecation warning. * Fix boolean array for arrow-backed DF. (#10527) * [EM] Move prefetch in reset into the end of the iteration. (#10529) * Enhance the threadpool implementation. (#10531) - Accept an initialization function. - Support void return tasks. * [doc] Update link to release notes. [skip ci] (#10533) * [doc] Fix learning to rank tutorial. [skip ci] (#10539) * Cache GPU histogram kernel configuration. (#10538) * [sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (#10543) * reoder if-else statements for sycl compatibility * trigger check --------- Co-authored-by: Dmitry Razdoburdin <> * [EM] Basic distributed test for external memory. (#10492) * [sycl] Improve build configuration. (#10548) Co-authored-by: Dmitry Razdoburdin <> * [R] Update roxygen. (#10556) * [doc] Add more detailed explanations for advanced objectives (#10283) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [doc] Add `build_info` to autodoc. [skip ci] (#10551) * [doc] Add notes about RMM and device ordinal. [skip ci] (#10562) - Remove the experimental tag, we have been running it for a long time now. - Add notes about avoiding set CUDA device. - Add link in parameter. * Fix empty partition. (#10559) * Avoid the use of size_t in the partitioner. (#10541) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span. * [EM] Handle base idx in GPU histogram. (#10549) * [fed] Split up federated test CMake file. (#10566) - Collect all federated test files into the same directory. - Independently list the files. * Avoid thrust vector initialization. (#10544) * Avoid thrust vector initialization. - Add a wrapper for rmm device uvector. - Split up the `Resize` method for HDV. * Fix column split race condition. (#10572) * Small cleanup for CMake scripts. (#10573) - Remove rabit. * replace channel for sycl dependencies (#10576) Co-authored-by: Dmitry Razdoburdin <> * Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10497) Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.5.0 to 3.6.1. - [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.5.0...maven-project-info-reports-plugin-3.6.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.flink:flink-clients in /jvm-packages (#10517) Bumps [org.apache.flink:flink-clients](https://github.com/apache/flink) from 1.19.0 to 1.19.1. - [Commits](apache/flink@release-1.19.0...release-1.19.1) --- updated-dependencies: - dependency-name: org.apache.flink:flink-clients dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-surefire-plugin (#10429) Bumps [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.2.5 to 3.3.0. - [Release notes](https://github.com/apache/maven-surefire/releases) - [Commits](apache/maven-surefire@surefire-3.2.5...surefire-3.3.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-surefire-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump commons-logging:commons-logging in /jvm-packages/xgboost4j-spark (#10547) Bumps commons-logging:commons-logging from 1.3.2 to 1.3.3. --- updated-dependencies: - dependency-name: commons-logging:commons-logging dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump org.apache.maven.plugins:maven-jar-plugin (#10458) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.1 to 3.4.2. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.1...maven-jar-plugin-3.4.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10585) Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.6.1 to 3.6.2. - [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.6.1...maven-project-info-reports-plugin-3.6.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-release-plugin (#10586) Bumps [org.apache.maven.plugins:maven-release-plugin](https://github.com/apache/maven-release) from 3.0.1 to 3.1.1. - [Release notes](https://github.com/apache/maven-release/releases) - [Commits](apache/maven-release@maven-release-3.0.1...maven-release-3.1.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-release-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10536) Bumps net.alchim31.maven:scala-maven-plugin from 4.9.1 to 4.9.2. --- updated-dependencies: - dependency-name: net.alchim31.maven:scala-maven-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-checkstyle-plugin in /jvm-packages (#10518) Bumps [org.apache.maven.plugins:maven-checkstyle-plugin](https://github.com/apache/maven-checkstyle-plugin) from 3.3.1 to 3.4.0. - [Commits](apache/maven-checkstyle-plugin@maven-checkstyle-plugin-3.3.1...maven-checkstyle-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-checkstyle-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [R] Redesigned `xgboost()` interface skeleton (#10456) --------- Co-authored-by: Michael Mayer <mayermichael79@gmail.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Dmitry Razdoburdin <d.razdoburdin@gmail.com> Co-authored-by: david-cortes <david.cortes.rivera@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
* [coll] Allow using local host for testing. (#10526) - Don't try to retrieve the IP address if a host is specified. - Fix compiler deprecation warning. * Fix boolean array for arrow-backed DF. (#10527) * [EM] Move prefetch in reset into the end of the iteration. (#10529) * Enhance the threadpool implementation. (#10531) - Accept an initialization function. - Support void return tasks. * [doc] Update link to release notes. [skip ci] (#10533) * [doc] Fix learning to rank tutorial. [skip ci] (#10539) * Cache GPU histogram kernel configuration. (#10538) * [sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (#10543) * reoder if-else statements for sycl compatibility * trigger check --------- Co-authored-by: Dmitry Razdoburdin <> * [EM] Basic distributed test for external memory. (#10492) * [sycl] Improve build configuration. (#10548) Co-authored-by: Dmitry Razdoburdin <> * [R] Update roxygen. (#10556) * [doc] Add more detailed explanations for advanced objectives (#10283) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * [doc] Add `build_info` to autodoc. [skip ci] (#10551) * [doc] Add notes about RMM and device ordinal. [skip ci] (#10562) - Remove the experimental tag, we have been running it for a long time now. - Add notes about avoiding set CUDA device. - Add link in parameter. * Fix empty partition. (#10559) * Avoid the use of size_t in the partitioner. (#10541) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span. * [EM] Handle base idx in GPU histogram. (#10549) * [fed] Split up federated test CMake file. (#10566) - Collect all federated test files into the same directory. - Independently list the files. * Avoid thrust vector initialization. (#10544) * Avoid thrust vector initialization. - Add a wrapper for rmm device uvector. - Split up the `Resize` method for HDV. * Fix column split race condition. (#10572) * Small cleanup for CMake scripts. (#10573) - Remove rabit. * replace channel for sycl dependencies (#10576) Co-authored-by: Dmitry Razdoburdin <> * Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10497) Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.5.0 to 3.6.1. - [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.5.0...maven-project-info-reports-plugin-3.6.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.flink:flink-clients in /jvm-packages (#10517) Bumps [org.apache.flink:flink-clients](https://github.com/apache/flink) from 1.19.0 to 1.19.1. - [Commits](apache/flink@release-1.19.0...release-1.19.1) --- updated-dependencies: - dependency-name: org.apache.flink:flink-clients dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-surefire-plugin (#10429) Bumps [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.2.5 to 3.3.0. - [Release notes](https://github.com/apache/maven-surefire/releases) - [Commits](apache/maven-surefire@surefire-3.2.5...surefire-3.3.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-surefire-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump commons-logging:commons-logging in /jvm-packages/xgboost4j-spark (#10547) Bumps commons-logging:commons-logging from 1.3.2 to 1.3.3. --- updated-dependencies: - dependency-name: commons-logging:commons-logging dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> * Bump org.apache.maven.plugins:maven-jar-plugin (#10458) Bumps [org.apache.maven.plugins:maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.4.1 to 3.4.2. - [Release notes](https://github.com/apache/maven-jar-plugin/releases) - [Commits](apache/maven-jar-plugin@maven-jar-plugin-3.4.1...maven-jar-plugin-3.4.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-jar-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-project-info-reports-plugin (#10585) Bumps [org.apache.maven.plugins:maven-project-info-reports-plugin](https://github.com/apache/maven-project-info-reports-plugin) from 3.6.1 to 3.6.2. - [Commits](apache/maven-project-info-reports-plugin@maven-project-info-reports-plugin-3.6.1...maven-project-info-reports-plugin-3.6.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-project-info-reports-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-release-plugin (#10586) Bumps [org.apache.maven.plugins:maven-release-plugin](https://github.com/apache/maven-release) from 3.0.1 to 3.1.1. - [Release notes](https://github.com/apache/maven-release/releases) - [Commits](apache/maven-release@maven-release-3.0.1...maven-release-3.1.1) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-release-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump net.alchim31.maven:scala-maven-plugin in /jvm-packages/xgboost4j (#10536) Bumps net.alchim31.maven:scala-maven-plugin from 4.9.1 to 4.9.2. --- updated-dependencies: - dependency-name: net.alchim31.maven:scala-maven-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.maven.plugins:maven-checkstyle-plugin in /jvm-packages (#10518) Bumps [org.apache.maven.plugins:maven-checkstyle-plugin](https://github.com/apache/maven-checkstyle-plugin) from 3.3.1 to 3.4.0. - [Commits](apache/maven-checkstyle-plugin@maven-checkstyle-plugin-3.3.1...maven-checkstyle-plugin-3.4.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-checkstyle-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [R] Redesigned `xgboost()` interface skeleton (#10456) --------- Co-authored-by: Michael Mayer <mayermichael79@gmail.com> * [jvm-packages] Bump rapids version. (#10588) * Bump scalatest.version from 3.2.18 to 3.2.19 in /jvm-packages/xgboost4j (#10535) Bumps `scalatest.version` from 3.2.18 to 3.2.19. Updates `org.scalatest:scalatest_2.12` from 3.2.18 to 3.2.19 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.18...release-3.2.19) Updates `org.scalactic:scalactic_2.12` from 3.2.18 to 3.2.19 - [Release notes](https://github.com/scalatest/scalatest/releases) - [Commits](scalatest/scalatest@release-3.2.18...release-3.2.19) --- updated-dependencies: - dependency-name: org.scalatest:scalatest_2.12 dependency-type: direct:development update-type: version-update:semver-patch - dependency-name: org.scalactic:scalactic_2.12 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Doc] Fix CRAN badge in README [skip ci] (#10587) * Change http to https in Badges * Change all http to https * Partial fix for CTK 12.5 (#10574) * Merge approx tests. (#10583) * [CI] Reduce the frequency of dependabot PRs (#10593) * Bump actions/setup-python from 5.1.0 to 5.1.1 (#10599) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.1.0 to 5.1.1. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@82c7e63...39cd149) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/upload-artifact from 4.3.3 to 4.3.4 (#10600) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@6546280...0b2256b) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump com.fasterxml.jackson.core:jackson-databind (#10590) Bumps [com.fasterxml.jackson.core:jackson-databind](https://github.com/FasterXML/jackson) from 2.15.2 to 2.17.2. - [Commits](https://github.com/FasterXML/jackson/commits) --- updated-dependencies: - dependency-name: com.fasterxml.jackson.core:jackson-databind dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Refactor `DeviceUVector`. (#10595) Create a wrapper instead of using inheritance to avoid inconsistent interface of the class. * [EM] Support mmap backed ellpack. (#10602) - Support resource view in ellpack. - Define the CUDA version of MMAP resource. - Define the CUDA version of malloc resource. - Refactor cuda runtime API wrappers, and add memory access related wrappers. - gather windows macros into a single header. * [CI] Fix test environment. (#10609) * [CI] Fix test environment. * Remove shell. * Remove. * Update Dockerfile.i386 * [CI] Build a CPU-only wheel under name `xgboost-cpu` (#10603) * Drop support for CUDA legacy stream. (#10607) * Optionally skip cupy on windows. (#10611) * [EM] Prevent init with CUDA malloc resource. (#10606) * Move device histogram storage into `histogram.cuh`. (#10608) * Fix. * Fix. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Dmitry Razdoburdin <d.razdoburdin@gmail.com> Co-authored-by: david-cortes <david.cortes.rivera@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: RektPunk <110188257+RektPunk@users.noreply.github.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
ref #9810
This PR starts off the redesigned interface for
xgboost()
, by adding a bare-bones version that replaces the old one:xgb.DMatrix
.y
and the objective.objective
optional, picking a default according to the type ofy
.xgb.QuantileDMatrix
whenever possible.y
(column names and factor levels).xgb.Booster
, but has an additional classxgboost
, so that the current methods can be used on it.But:
print
that's added here.predict
method.params
.Among others. It should nevertheless provide a base to work on.
A couple items I am not so sure about - would be helpful to get reviews on these:
Surv
objects correctly for all cases?CCing @mayer79 for review.