diff --git a/R-package/R/xgb.cv.R b/R-package/R/xgb.cv.R index fdfcb59b31c9..484581a92be9 100644 --- a/R-package/R/xgb.cv.R +++ b/R-package/R/xgb.cv.R @@ -2,12 +2,15 @@ #' #' The cross validation function of xgboost #' -#' @param params the list of parameters. Commonly used ones are: +#' @param params the list of parameters. The complete list of parameters is +#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below +#' is a shorter summary: #' \itemize{ #' \item \code{objective} objective function, common ones are #' \itemize{ -#' \item \code{reg:squarederror} Regression with squared loss -#' \item \code{binary:logistic} logistic regression for classification +#' \item \code{reg:squarederror} Regression with squared loss. +#' \item \code{binary:logistic} logistic regression for classification. +#' \item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives. #' } #' \item \code{eta} step size of each boosting step #' \item \code{max_depth} maximum depth of the tree diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R index 8733bcce4f4d..2f00e09a2c63 100644 --- a/R-package/R/xgb.train.R +++ b/R-package/R/xgb.train.R @@ -3,9 +3,9 @@ #' \code{xgb.train} is an advanced interface for training an xgboost model. #' The \code{xgboost} function is a simpler wrapper for \code{xgb.train}. #' -#' @param params the list of parameters. -#' The complete list of parameters is available at \url{http://xgboost.readthedocs.io/en/latest/parameter.html}. -#' Below is a shorter summary: +#' @param params the list of parameters. The complete list of parameters is +#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below +#' is a shorter summary: #' #' 1. General Parameters #' @@ -43,13 +43,23 @@ #' \item \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it. The default objective options are below: #' \itemize{ #' \item \code{reg:squarederror} Regression with squared loss (Default). +#' \item \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}. All inputs are required to be greater than -1. Also, see metric rmsle for possible issue with this objective. #' \item \code{reg:logistic} logistic regression. +#' \item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss. #' \item \code{binary:logistic} logistic regression for binary classification. Output probability. #' \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation. -#' \item \code{num_class} set the number of classes. To use only with multiclass objectives. +#' \item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities. +#' \item \code{count:poisson}: poisson regression for count data, output mean of poisson distribution. \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization). +#' \item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function \code{h(t) = h0(t) * HR)}. +#' \item \code{survival:aft}: Accelerated failure time model for censored survival time data. See \href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time} for details. +#' \item \code{aft_loss_distribution}: Probabilty Density Function used by \code{survival:aft} and \code{aft-nloglik} metric. #' \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class - 1}. #' \item \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class. #' \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss. +#' \item \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where \href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized. +#' \item \code{rank:map}: Use LambdaMART to perform list-wise ranking where \href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)} is maximized. +#' \item \code{reg:gamma}: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be \href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}. +#' \item \code{reg:tweedie}: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be \href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}. #' } #' \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5 #' \item \code{eval_metric} evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). List is provided in detail section. diff --git a/R-package/man/xgb.cv.Rd b/R-package/man/xgb.cv.Rd index 8cb03a5ef3f0..8532305b11e9 100644 --- a/R-package/man/xgb.cv.Rd +++ b/R-package/man/xgb.cv.Rd @@ -28,12 +28,15 @@ xgb.cv( ) } \arguments{ -\item{params}{the list of parameters. Commonly used ones are: +\item{params}{the list of parameters. The complete list of parameters is + available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below + is a shorter summary: \itemize{ \item \code{objective} objective function, common ones are \itemize{ - \item \code{reg:squarederror} Regression with squared loss - \item \code{binary:logistic} logistic regression for classification + \item \code{reg:squarederror} Regression with squared loss. + \item \code{binary:logistic} logistic regression for classification. + \item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives. } \item \code{eta} step size of each boosting step \item \code{max_depth} maximum depth of the tree diff --git a/R-package/man/xgb.load.Rd b/R-package/man/xgb.load.Rd index 3f743e1f525c..8c20d0e2a129 100644 --- a/R-package/man/xgb.load.Rd +++ b/R-package/man/xgb.load.Rd @@ -17,8 +17,8 @@ Load xgboost model from the binary model file. } \details{ The input file is expected to contain a model saved in an xgboost-internal binary format -using either \code{\link{xgb.save}} or \code{\link{cb.save.model}} in R, or using some -appropriate methods from other xgboost interfaces. E.g., a model trained in Python and +using either \code{\link{xgb.save}} or \code{\link{cb.save.model}} in R, or using some +appropriate methods from other xgboost interfaces. E.g., a model trained in Python and saved from there in xgboost format, could be loaded from R. Note: a model saved as an R-object, has to be loaded using corresponding R-methods, @@ -29,7 +29,7 @@ data(agaricus.train, package='xgboost') data(agaricus.test, package='xgboost') train <- agaricus.train test <- agaricus.test -bst <- xgboost(data = train$data, label = train$label, max_depth = 2, +bst <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic") xgb.save(bst, 'xgb.model') bst <- xgb.load('xgb.model') diff --git a/R-package/man/xgb.train.Rd b/R-package/man/xgb.train.Rd index a6c91ccf47de..e592b7a036ab 100644 --- a/R-package/man/xgb.train.Rd +++ b/R-package/man/xgb.train.Rd @@ -42,9 +42,9 @@ xgboost( ) } \arguments{ -\item{params}{the list of parameters. - The complete list of parameters is available at \url{http://xgboost.readthedocs.io/en/latest/parameter.html}. - Below is a shorter summary: +\item{params}{the list of parameters. The complete list of parameters is + available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below + is a shorter summary: 1. General Parameters @@ -82,13 +82,23 @@ xgboost( \item \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it. The default objective options are below: \itemize{ \item \code{reg:squarederror} Regression with squared loss (Default). + \item \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}. All inputs are required to be greater than -1. Also, see metric rmsle for possible issue with this objective. \item \code{reg:logistic} logistic regression. + \item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss. \item \code{binary:logistic} logistic regression for binary classification. Output probability. \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation. - \item \code{num_class} set the number of classes. To use only with multiclass objectives. + \item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities. + \item \code{count:poisson}: poisson regression for count data, output mean of poisson distribution. \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization). + \item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function \code{h(t) = h0(t) * HR)}. + \item \code{survival:aft}: Accelerated failure time model for censored survival time data. See \href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time} for details. + \item \code{aft_loss_distribution}: Probabilty Density Function used by \code{survival:aft} and \code{aft-nloglik} metric. \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class - 1}. \item \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class. \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss. + \item \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where \href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized. + \item \code{rank:map}: Use LambdaMART to perform list-wise ranking where \href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)} is maximized. + \item \code{reg:gamma}: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be \href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}. + \item \code{reg:tweedie}: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be \href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}. } \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5 \item \code{eval_metric} evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). List is provided in detail section.