Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix param aliases #4387

Merged
merged 1 commit into from
Jun 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion R-package/R/aliases.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# [description] List of respected parameter aliases specific to lgb.Dataset. Wrapped in a function to
# take advantage of lazy evaluation (so it doesn't matter what order
# R sources files during installation).
# [return] A named list, where each key is a parameter relevant to lgb.DataSet and each value is a character
# [return] A named list, where each key is a parameter relevant to lgb.Dataset and each value is a character
# vector of corresponding aliases.
.DATASET_PARAMETERS <- function() {
return(
Expand Down Expand Up @@ -57,13 +57,18 @@
"label_column"
, "label"
)
, "linear_tree" = c(
"linear_tree"
, "linear_trees"
)
, "max_bin" = "max_bin"
, "max_bin_by_feature" = "max_bin_by_feature"
, "min_data_in_bin" = "min_data_in_bin"
, "pre_partition" = c(
"pre_partition"
, "is_pre_partition"
)
, "precise_float_parser" = "precise_float_parser"
, "two_round" = c(
"two_round"
, "two_round_loading"
Expand Down
44 changes: 22 additions & 22 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,28 +139,6 @@ Core Parameters

- **Note**: internally, LightGBM uses ``gbdt`` mode for the first ``1 / learning_rate`` iterations

- ``linear_tree`` :raw-html:`<a id="linear_tree" title="Permalink to this parameter" href="#linear_tree">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``linear_trees``

- fit piecewise linear gradient boosting tree

- tree splits are chosen in the usual way, but the model at each leaf is linear instead of constant

- the linear model at each leaf includes all the numerical features in that leaf's branch

- categorical features are used for splits as normal but are not used in the linear models

- missing values should not be encoded as ``0``. Use ``np.nan`` for Python, ``NA`` for the CLI, and ``NA``, ``NA_real_``, or ``NA_integer_`` for R

- it is recommended to rescale data before training so that features have similar mean and standard deviation

- **Note**: only works with CPU and ``serial`` tree learner

- **Note**: ``regression_l1`` objective is not supported with linear tree boosting

- **Note**: setting ``linear_tree=true`` significantly increases the memory use of LightGBM

- **Note**: if you specify ``monotone_constraints``, constraints will be enforced when choosing the split points, but not when fitting the linear models on leaves

- ``data`` :raw-html:`<a id="data" title="Permalink to this parameter" href="#data">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``train``, ``train_data``, ``train_data_file``, ``data_filename``

- path of training data, LightGBM will train from this data
Expand Down Expand Up @@ -672,6 +650,28 @@ IO Parameters
Dataset Parameters
~~~~~~~~~~~~~~~~~~

- ``linear_tree`` :raw-html:`<a id="linear_tree" title="Permalink to this parameter" href="#linear_tree">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``linear_trees``

- fit piecewise linear gradient boosting tree

- tree splits are chosen in the usual way, but the model at each leaf is linear instead of constant

- the linear model at each leaf includes all the numerical features in that leaf's branch

- categorical features are used for splits as normal but are not used in the linear models

- missing values should not be encoded as ``0``. Use ``np.nan`` for Python, ``NA`` for the CLI, and ``NA``, ``NA_real_``, or ``NA_integer_`` for R

- it is recommended to rescale data before training so that features have similar mean and standard deviation

- **Note**: only works with CPU and ``serial`` tree learner

- **Note**: ``regression_l1`` objective is not supported with linear tree boosting

- **Note**: setting ``linear_tree=true`` significantly increases the memory use of LightGBM

- **Note**: if you specify ``monotone_constraints``, constraints will be enforced when choosing the split points, but not when fitting the linear models on leaves

- ``max_bin`` :raw-html:`<a id="max_bin" title="Permalink to this parameter" href="#max_bin">&#x1F517;&#xFE0E;</a>`, default = ``255``, type = int, constraints: ``max_bin > 1``

- max number of bins that feature values will be bucketed in
Expand Down
26 changes: 13 additions & 13 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -149,19 +149,6 @@ struct Config {
// descl2 = **Note**: internally, LightGBM uses ``gbdt`` mode for the first ``1 / learning_rate`` iterations
std::string boosting = "gbdt";

// alias = linear_trees
// desc = fit piecewise linear gradient boosting tree
// descl2 = tree splits are chosen in the usual way, but the model at each leaf is linear instead of constant
// descl2 = the linear model at each leaf includes all the numerical features in that leaf's branch
// descl2 = categorical features are used for splits as normal but are not used in the linear models
// descl2 = missing values should not be encoded as ``0``. Use ``np.nan`` for Python, ``NA`` for the CLI, and ``NA``, ``NA_real_``, or ``NA_integer_`` for R
// descl2 = it is recommended to rescale data before training so that features have similar mean and standard deviation
// descl2 = **Note**: only works with CPU and ``serial`` tree learner
// descl2 = **Note**: ``regression_l1`` objective is not supported with linear tree boosting
// descl2 = **Note**: setting ``linear_tree=true`` significantly increases the memory use of LightGBM
// descl2 = **Note**: if you specify ``monotone_constraints``, constraints will be enforced when choosing the split points, but not when fitting the linear models on leaves
bool linear_tree = false;

// alias = train, train_data, train_data_file, data_filename
// desc = path of training data, LightGBM will train from this data
// desc = **Note**: can be used only in CLI version
Expand Down Expand Up @@ -586,6 +573,19 @@ struct Config {

#pragma region Dataset Parameters

// alias = linear_trees
// desc = fit piecewise linear gradient boosting tree
// descl2 = tree splits are chosen in the usual way, but the model at each leaf is linear instead of constant
// descl2 = the linear model at each leaf includes all the numerical features in that leaf's branch
// descl2 = categorical features are used for splits as normal but are not used in the linear models
// descl2 = missing values should not be encoded as ``0``. Use ``np.nan`` for Python, ``NA`` for the CLI, and ``NA``, ``NA_real_``, or ``NA_integer_`` for R
// descl2 = it is recommended to rescale data before training so that features have similar mean and standard deviation
// descl2 = **Note**: only works with CPU and ``serial`` tree learner
// descl2 = **Note**: ``regression_l1`` objective is not supported with linear tree boosting
// descl2 = **Note**: setting ``linear_tree=true`` significantly increases the memory use of LightGBM
// descl2 = **Note**: if you specify ``monotone_constraints``, constraints will be enforced when choosing the split points, but not when fitting the linear models on leaves
bool linear_tree = false;

// check = >1
// desc = max number of bins that feature values will be bucketed in
// desc = small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)
Expand Down
9 changes: 8 additions & 1 deletion python-package/lightgbm/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,8 @@ class _ConfigAliases:
"sparse"},
"label_column": {"label_column",
"label"},
"linear_tree": {"linear_tree",
"linear_trees"},
"local_listen_port": {"local_listen_port",
"local_port",
"port"},
Expand Down Expand Up @@ -1144,6 +1146,7 @@ def get_params(self):
"max_bin_by_feature",
"min_data_in_bin",
"pre_partition",
"precise_float_parser",
"two_round",
"use_missing",
"weight_column",
Expand Down Expand Up @@ -3180,7 +3183,11 @@ def refit(self, data, label, decay_rate=0.9, **kwargs):
_safe_call(_LIB.LGBM_BoosterGetLinear(
self.handle,
ctypes.byref(out_is_linear)))
new_params = deepcopy(self.params)
new_params = _choose_param_value(
main_param_name="linear_tree",
params=self.params,
default_value=None
)
new_params["linear_tree"] = out_is_linear.value
train_set = Dataset(data, label, silent=True, params=new_params)
new_params['refit_decay_rate'] = decay_rate
Expand Down
6 changes: 5 additions & 1 deletion src/c_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -287,9 +287,13 @@ class Booster {
"You need to set `feature_pre_filter=false` to dynamically change "
"the `min_data_in_leaf`.");
}
if (new_param.count("linear_tree") && (new_config.linear_tree != old_config.linear_tree)) {
if (new_param.count("linear_tree") && new_config.linear_tree != old_config.linear_tree) {
Log::Fatal("Cannot change linear_tree after constructed Dataset handle.");
}
if (new_param.count("precise_float_parser") &&
new_config.precise_float_parser != old_config.precise_float_parser) {
Log::Fatal("Cannot change precise_float_parser after constructed Dataset handle.");
}
}

void ResetConfig(const char* parameters) {
Expand Down
10 changes: 5 additions & 5 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"application", "objective"},
{"boosting_type", "boosting"},
{"boost", "boosting"},
{"linear_trees", "linear_tree"},
{"train", "data"},
{"train_data", "data"},
{"train_data_file", "data"},
Expand Down Expand Up @@ -106,6 +105,7 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"model_output", "output_model"},
{"model_out", "output_model"},
{"save_period", "snapshot_freq"},
{"linear_trees", "linear_tree"},
{"subsample_for_bin", "bin_construct_sample_cnt"},
{"data_seed", "data_random_seed"},
{"is_sparse", "is_enable_sparse"},
Expand Down Expand Up @@ -176,7 +176,6 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"task",
"objective",
"boosting",
"linear_tree",
"data",
"valid",
"num_iterations",
Expand Down Expand Up @@ -241,6 +240,7 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"output_model",
"saved_feature_importance_type",
"snapshot_freq",
"linear_tree",
"max_bin",
"max_bin_by_feature",
"min_data_in_bin",
Expand Down Expand Up @@ -309,8 +309,6 @@ const std::unordered_set<std::string>& Config::parameter_set() {

void Config::GetMembersFromString(const std::unordered_map<std::string, std::string>& params) {
std::string tmp_str = "";
GetBool(params, "linear_tree", &linear_tree);

GetString(params, "data", &data);

if (GetString(params, "valid", &tmp_str)) {
Expand Down Expand Up @@ -483,6 +481,8 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str

GetInt(params, "snapshot_freq", &snapshot_freq);

GetBool(params, "linear_tree", &linear_tree);

GetInt(params, "max_bin", &max_bin);
CHECK_GT(max_bin, 1);

Expand Down Expand Up @@ -634,7 +634,6 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str

std::string Config::SaveMembersToString() const {
std::stringstream str_buf;
str_buf << "[linear_tree: " << linear_tree << "]\n";
str_buf << "[data: " << data << "]\n";
str_buf << "[valid: " << Common::Join(valid, ",") << "]\n";
str_buf << "[num_iterations: " << num_iterations << "]\n";
Expand Down Expand Up @@ -693,6 +692,7 @@ std::string Config::SaveMembersToString() const {
str_buf << "[interaction_constraints: " << interaction_constraints << "]\n";
str_buf << "[verbosity: " << verbosity << "]\n";
str_buf << "[saved_feature_importance_type: " << saved_feature_importance_type << "]\n";
str_buf << "[linear_tree: " << linear_tree << "]\n";
str_buf << "[max_bin: " << max_bin << "]\n";
str_buf << "[max_bin_by_feature: " << Common::Join(max_bin_by_feature, ",") << "]\n";
str_buf << "[min_data_in_bin: " << min_data_in_bin << "]\n";
Expand Down
4 changes: 3 additions & 1 deletion tests/python_package_test/test_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -2345,6 +2345,7 @@ def test_dataset_update_params():
"ignore_column": 0,
"min_data_in_leaf": 10,
"linear_tree": False,
"precise_float_parser": True,
"verbose": -1}
unchangeable_params = {"max_bin": 150,
"max_bin_by_feature": [30, 5],
Expand All @@ -2366,7 +2367,8 @@ def test_dataset_update_params():
"ignore_column": 1,
"forcedbins_filename": "/some/path/forcedbins.json",
"min_data_in_leaf": 2,
"linear_tree": True}
"linear_tree": True,
"precise_float_parser": False}
X = np.random.random((100, 2))
y = np.random.random(100)

Expand Down