-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Trees with linear models at leaves (#3299)
* Add Eigen library. * Working for simple test. * Apply changes to config params. * Handle nan data. * Update docs. * Add test. * Only load raw data if boosting=gbdt_linear * Remove unneeded code. * Minor updates. * Update to work with sk-learn interface. * Update to work with chunked datasets. * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters. * Save raw data in binary dataset file. * Update docs and fix parameter checking. * Fix dataset loading. * Add test for regularization. * Fix bugs when saving and loading tree. * Add test for load/save linear model. * Remove unneeded code. * Fix case where not enough leaf data for linear model. * Simplify code. * Speed up code. * Speed up code. * Simplify code. * Speed up code. * Fix bugs. * Working version. * Store feature data column-wise (not fully working yet). * Fix bugs. * Speed up. * Speed up. * Remove unneeded code. * Small speedup. * Speed up. * Minor updates. * Remove unneeded code. * Fix bug. * Fix bug. * Speed up. * Speed up. * Simplify code. * Remove unneeded code. * Fix bug, add more tests. * Fix bug and add test. * Only store numerical features * Fix bug and speed up using templates. * Speed up prediction. * Fix bug with regularisation * Visual studio files. * Working version * Only check nans if necessary * Store coeff matrix as an array. * Align cache lines * Align cache lines * Preallocation coefficient calculation matrices * Small speedups * Small speedup * Reverse cache alignment changes * Change to dynamic schedule * Update docs. * Refactor so that linear tree learner is not a separate class. * Add refit capability. * Speed up * Small speedups. * Speed up add prediction to score. * Fix bug * Fix bug and speed up. * Speed up dataload. * Speed up dataload * Use vectors instead of pointers * Fix bug * Add OMP exception handling. * Change return type of LGBM_BoosterGetLinear to bool * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change * Remove unused internal_parent_ property of tree * Remove unused parameter to CreateTreeLearner * Remove reference to LinearTreeLearner * Minor style issues * Remove unneeded check * Reverse temporary testing change * Fix Visual Studio project files * Restore LightGBM.vcxproj.filters * Speed up * Speed up * Simplify code * Update docs * Simplify code * Initialise storage space for max num threads * Move Eigen to include directory and delete unused files * Remove old files. * Fix so it compiles with mingw * Fix gpu tree learner * Change AddPredictionToScore back to const * Fix python lint error * Fix C++ lint errors * Change eigen to a submodule * Update comment * Add the eigen folder * Try to fix build issues with eigen * Remove eigen files * Add eigen as submodule * Fix include paths * Exclude eigen files from Python linter * Ignore eigen folders for pydocstyle * Fix C++ linting errors * Fix docs * Fix docs * Exclude eigen directories from doxygen * Update manifest to include eigen * Update build_r to include eigen files * Fix compiler warnings * Store raw feature data as float * Use float for calculating linear coefficients * Remove eigen directory from GLOB * Don't compile linear model code when building R package * Fix doxygen issue * Fix lint issue * Fix lint issue * Remove uneeded code * Restore delected lines * Restore delected lines * Change return type of has_raw to bool * Update docs * Rename some variables and functions for readability * Make tree_learner parameter const in AddScore * Fix style issues * Pass vectors as const reference when setting tree properties * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const * Remove get_raw_size, use num_numeric_features instead * Fix typo * Make contains_nan_ and any_nan_ properties immutable again * Remove data_has_nan_ property of tree * Remove temporary test code * Make linear_tree a dataset param * Fix lint error * Make LinearTreeLearner a separate class * Fix lint errors * Fix lint error * Add linear_tree_learner.o * Simulate omp_get_max_threads if openmp is not available * Update PushOneData to also store raw data. * Cast size to int * Fix bug in ReshapeRaw * Speed up code with multithreading * Use OMP_NUM_THREADS * Speed up with multithreading * Update to use ArrayToString * Fix tests * Fix test * Fix bug introduced in merge * Minor updates * Update docs
- Loading branch information
Showing
41 changed files
with
1,538 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# task type, support train and predict | ||
task = train | ||
|
||
# boosting type, support gbdt for now, alias: boosting, boost | ||
boosting_type = gbdt | ||
|
||
# application type, support following application | ||
# regression , regression task | ||
# binary , binary classification task | ||
# lambdarank , lambdarank task | ||
# alias: application, app | ||
objective = binary | ||
|
||
linear_tree = true | ||
|
||
# eval metrics, support multi metric, delimite by ',' , support following metrics | ||
# l1 | ||
# l2 , default metric for regression | ||
# ndcg , default metric for lambdarank | ||
# auc | ||
# binary_logloss , default metric for binary | ||
# binary_error | ||
metric = binary_logloss,auc | ||
|
||
# frequence for metric output | ||
metric_freq = 1 | ||
|
||
# true if need output metric for training data, alias: tranining_metric, train_metric | ||
is_training_metric = true | ||
|
||
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy. | ||
max_bin = 255 | ||
|
||
# training data | ||
# if exsting weight file, should name to "binary.train.weight" | ||
# alias: train_data, train | ||
data = binary.train | ||
|
||
# validation data, support multi validation data, separated by ',' | ||
# if exsting weight file, should name to "binary.test.weight" | ||
# alias: valid, test, test_data, | ||
valid_data = binary.test | ||
|
||
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds | ||
num_trees = 100 | ||
|
||
# shrinkage rate , alias: shrinkage_rate | ||
learning_rate = 0.1 | ||
|
||
# number of leaves for one tree, alias: num_leaf | ||
num_leaves = 63 | ||
|
||
# type of tree learner, support following types: | ||
# serial , single machine version | ||
# feature , use feature parallel to train | ||
# data , use data parallel to train | ||
# voting , use voting based parallel to train | ||
# alias: tree | ||
tree_learner = serial | ||
|
||
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu. | ||
# num_threads = 8 | ||
|
||
# feature sub-sample, will random select 80% feature to train on each iteration | ||
# alias: sub_feature | ||
feature_fraction = 0.8 | ||
|
||
# Support bagging (data sub-sample), will perform bagging every 5 iterations | ||
bagging_freq = 5 | ||
|
||
# Bagging farction, will random select 80% data on bagging | ||
# alias: sub_row | ||
bagging_fraction = 0.8 | ||
|
||
# minimal number data for one leaf, use this to deal with over-fit | ||
# alias : min_data_per_leaf, min_data | ||
min_data_in_leaf = 50 | ||
|
||
# minimal sum hessians for one leaf, use this to deal with over-fit | ||
min_sum_hessian_in_leaf = 5.0 | ||
|
||
# save memory and faster speed for sparse feature, alias: is_sparse | ||
is_enable_sparse = true | ||
|
||
# when data is bigger than memory size, set this to true. otherwise set false will have faster speed | ||
# alias: two_round_loading, two_round | ||
use_two_round_loading = false | ||
|
||
# true if need to save data to binary file and application will auto load data from binary file next time | ||
# alias: is_save_binary, save_binary | ||
is_save_binary_file = false | ||
|
||
# output model file | ||
output_model = LightGBM_model.txt | ||
|
||
# support continuous train from trained gbdt model | ||
# input_model= trained_model.txt | ||
|
||
# output prediction file for predict task | ||
# output_result= prediction.txt | ||
|
||
|
||
# number of machines in parallel training, alias: num_machine | ||
num_machines = 1 | ||
|
||
# local listening port in parallel training, alias: local_port | ||
local_listen_port = 12400 | ||
|
||
# machines list file for parallel training, alias: mlist | ||
machine_list_file = mlist.txt | ||
|
||
# force splits | ||
# forced_splits = forced_splits.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
fcfd413
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why linear_tree and linear_lambda are not know parameters in 3.1.1 version of python API?
fcfd413
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This PR was merged v3.1.1 was released on December 7 (https://github.com/microsoft/LightGBM/releases/tag/v3.1.1) and this pull request was merged on December 24. This feature will be in a future release of LightGBM.
If you have additional questions, please open an issue.