Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct spelling #4250

Merged
merged 5 commits into from
May 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .ci/test_windows.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ function Check-Output {
}
}

# unify environment variable for Azure devops and AppVeyor
# unify environment variable for Azure DevOps and AppVeyor
if (Test-Path env:APPVEYOR) {
$env:APPVEYOR = "true"
}
Expand Down Expand Up @@ -66,7 +66,7 @@ elseif ($env:TASK -eq "sdist") {
}
elseif ($env:TASK -eq "bdist") {
# Import the Chocolatey profile module so that the RefreshEnv command
# invoked below properly updates the current PowerShell session enviroment.
# invoked below properly updates the current PowerShell session environment.
$module = "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1"
Import-Module "$module" ; Check-Output $?
RefreshEnv
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith

For further details, please refer to [Features](https://github.com/microsoft/LightGBM/blob/master/docs/Features.rst).

Benefitting from these advantages, LightGBM is being widely-used in many [winning solutions](https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions) of machine learning competitions.
Benefiting from these advantages, LightGBM is being widely-used in many [winning solutions](https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions) of machine learning competitions.

[Comparison experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, [distributed learning experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

Expand Down
4 changes: 2 additions & 2 deletions docs/Advanced-Topics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ Missing Value Handle

- LightGBM uses NA (NaN) to represent missing values by default. Change it to use zero by setting ``zero_as_missing=true``.

- When ``zero_as_missing=false`` (default), the unshown values in sparse matrices (and LightSVM) are treated as zeros.
- When ``zero_as_missing=false`` (default), the unrecorded values in sparse matrices (and LightSVM) are treated as zeros.

- When ``zero_as_missing=true``, NA and zeros (including unshown values in sparse matrices (and LightSVM)) are treated as missing.
- When ``zero_as_missing=true``, NA and zeros (including unrecorded values in sparse matrices (and LightSVM)) are treated as missing.

Categorical Feature Support
---------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/Experiments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ We used a terabyte click log dataset to conduct parallel experiments. Details ar
+--------+-----------------------+---------+---------------+----------+

This data contains 13 integer features and 26 categorical features for 24 days of click logs.
We statisticized the clickthrough rate (CTR) and count for these 26 categorical features from the first ten days.
We statisticized the click-through rate (CTR) and count for these 26 categorical features from the first ten days.
Then we used next ten days' data, after replacing the categorical features by the corresponding CTR and count, as training data.
The processed training data have a total of 1.7 billions records and 67 features.

Expand Down
2 changes: 1 addition & 1 deletion docs/Features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ LightGBM supports the following applications:

- cross-entropy, the objective function is logloss and supports training on non-binary labels

- lambdarank, the objective function is lambdarank with NDCG
- LambdaRank, the objective function is LambdaRank with NDCG

LightGBM supports the following metrics:

Expand Down
2 changes: 1 addition & 1 deletion docs/Python-Intro.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Python-package Introduction
===========================

This document gives a basic walkthrough of LightGBM Python-package.
This document gives a basic walk-through of LightGBM Python-package.

**List of other helpful links**

Expand Down
16 changes: 8 additions & 8 deletions examples/binary_classification/train.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ boosting_type = gbdt
# application type, support following application
# regression , regression task
# binary , binary classification task
# lambdarank , lambdarank task
# lambdarank , LambdaRank task
# alias: application, app
objective = binary

# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l2 , default metric for regression
# ndcg , default metric for lambdarank
Expand All @@ -20,7 +20,7 @@ objective = binary
# binary_error
metric = binary_logloss,auc

# frequence for metric output
# frequency for metric output
metric_freq = 1

# true if need output metric for training data, alias: tranining_metric, train_metric
Expand All @@ -30,12 +30,12 @@ is_training_metric = true
max_bin = 255

# training data
# if exsting weight file, should name to "binary.train.weight"
# if existing weight file, should name to "binary.train.weight"
# alias: train_data, train
data = binary.train

# validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight"
# if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data,
valid_data = binary.test

Expand All @@ -56,7 +56,7 @@ num_leaves = 63
# alias: tree
tree_learner = serial

# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu.
# number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# num_threads = 8

# feature sub-sample, will random select 80% feature to train on each iteration
Expand All @@ -66,15 +66,15 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5

# Bagging farction, will random select 80% data on bagging
# Bagging fraction, will random select 80% data on bagging
# alias: sub_row
bagging_fraction = 0.8

# minimal number data for one leaf, use this to deal with over-fit
# alias : min_data_per_leaf, min_data
min_data_in_leaf = 50

# minimal sum hessians for one leaf, use this to deal with over-fit
# minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0

# save memory and faster speed for sparse feature, alias: is_sparse
Expand Down
16 changes: 8 additions & 8 deletions examples/binary_classification/train_linear.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ boosting_type = gbdt
# application type, support following application
# regression , regression task
# binary , binary classification task
# lambdarank , lambdarank task
# lambdarank , LambdaRank task
# alias: application, app
objective = binary

linear_tree = true

# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l2 , default metric for regression
# ndcg , default metric for lambdarank
Expand All @@ -22,7 +22,7 @@ linear_tree = true
# binary_error
metric = binary_logloss,auc

# frequence for metric output
# frequency for metric output
metric_freq = 1

# true if need output metric for training data, alias: tranining_metric, train_metric
Expand All @@ -32,12 +32,12 @@ is_training_metric = true
max_bin = 255

# training data
# if exsting weight file, should name to "binary.train.weight"
# if existing weight file, should name to "binary.train.weight"
# alias: train_data, train
data = binary.train

# validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight"
# if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data,
valid_data = binary.test

Expand All @@ -58,7 +58,7 @@ num_leaves = 63
# alias: tree
tree_learner = serial

# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu.
# number of threads for multi-threading. One thread will use each CPU. The default is set to CPU count.
# num_threads = 8

# feature sub-sample, will random select 80% feature to train on each iteration
Expand All @@ -68,15 +68,15 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5

# Bagging farction, will random select 80% data on bagging
# Bagging fraction, will random select 80% data on bagging
# alias: sub_row
bagging_fraction = 0.8

# minimal number data for one leaf, use this to deal with over-fit
# alias : min_data_per_leaf, min_data
min_data_in_leaf = 50

# minimal sum hessians for one leaf, use this to deal with over-fit
# minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0

# save memory and faster speed for sparse feature, alias: is_sparse
Expand Down
2 changes: 1 addition & 1 deletion examples/lambdarank/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
LambdaRank Example
==================

Here is an example for LightGBM to run lambdarank task.
Here is an example for LightGBM to run LambdaRank task.

***You must follow the [installation instructions](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html)
for the following commands to work. The `lightgbm` binary must be built and available at the root of this project.***
Expand Down
18 changes: 9 additions & 9 deletions examples/lambdarank/train.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ boosting_type = gbdt
# application type, support following application
# regression , regression task
# binary , binary classification task
# lambdarank , lambdarank task
# lambdarank , LambdaRank task
# alias: application, app
objective = lambdarank

# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l2 , default metric for regression
# ndcg , default metric for lambdarank
Expand All @@ -23,7 +23,7 @@ metric = ndcg
# evaluation position for ndcg metric, alias : ndcg_at
ndcg_eval_at = 1,3,5

# frequence for metric output
# frequency for metric output
metric_freq = 1

# true if need output metric for training data, alias: tranining_metric, train_metric
Expand All @@ -33,14 +33,14 @@ is_training_metric = true
max_bin = 255

# training data
# if exsting weight file, should name to "rank.train.weight"
# if exsting query file, should name to "rank.train.query"
# if existing weight file, should name to "rank.train.weight"
# if existing query file, should name to "rank.train.query"
# alias: train_data, train
data = rank.train

# validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "rank.test.weight"
# if exsting query file, should name to "rank.test.query"
# if existing weight file, should name to "rank.test.weight"
# if existing query file, should name to "rank.test.query"
# alias: valid, test, test_data,
valid_data = rank.test

Expand Down Expand Up @@ -71,15 +71,15 @@ feature_fraction = 1.0
# Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 1

# Bagging farction, will random select 80% data on bagging
# Bagging fraction, will random select 80% data on bagging
# alias: sub_row
bagging_fraction = 0.9

# minimal number data for one leaf, use this to deal with over-fit
# alias : min_data_per_leaf, min_data
min_data_in_leaf = 50

# minimal sum hessians for one leaf, use this to deal with over-fit
# minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0

# save memory and faster speed for sparse feature, alias: is_sparse
Expand Down
8 changes: 4 additions & 4 deletions examples/multiclass_classification/train.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ boosting_type = gbdt
# application type, support following application
# regression , regression task
# binary , binary classification task
# lambdarank , lambdarank task
# lambdarank , LambdaRank task
# multiclass
# alias: application, app
objective = multiclass

# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l2 , default metric for regression
# ndcg , default metric for lambdarank
Expand All @@ -35,7 +35,7 @@ auc_mu_weights = 0,1,2,3,4,5,0,6,7,8,9,10,0,11,12,13,14,15,0,16,17,18,19,20,0
# number of class, for multiclass classification
num_class = 5

# frequence for metric output
# frequency for metric output
metric_freq = 1

# true if need output metric for training data, alias: tranining_metric, train_metric
Expand All @@ -45,7 +45,7 @@ is_training_metric = true
max_bin = 255

# training data
# if exsting weight file, should name to "regression.train.weight"
# if existing weight file, should name to "regression.train.weight"
# alias: train_data, train
data = multiclass.train

Expand Down
14 changes: 7 additions & 7 deletions examples/parallel_learning/train.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ boosting_type = gbdt
# application type, support following application
# regression , regression task
# binary , binary classification task
# lambdarank , lambdarank task
# lambdarank , LambdaRank task
# alias: application, app
objective = binary

Expand All @@ -20,7 +20,7 @@ objective = binary
# binary_error
metric = binary_logloss,auc

# frequence for metric output
# frequency for metric output
metric_freq = 1

# true if need output metric for training data, alias: tranining_metric, train_metric
Expand All @@ -30,12 +30,12 @@ is_training_metric = true
max_bin = 255

# training data
# if exsting weight file, should name to "binary.train.weight"
# if existing weight file, should name to "binary.train.weight"
# alias: train_data, train
data = binary.train

# validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight"
# if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data,
valid_data = binary.test

Expand All @@ -56,7 +56,7 @@ num_leaves = 63
# alias: tree
tree_learner = feature

# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu.
# number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# num_threads = 8

# feature sub-sample, will random select 80% feature to train on each iteration
Expand All @@ -66,15 +66,15 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5

# Bagging farction, will random select 80% data on bagging
# Bagging fraction, will random select 80% data on bagging
# alias: sub_row
bagging_fraction = 0.8

# minimal number data for one leaf, use this to deal with over-fit
# alias : min_data_per_leaf, min_data
min_data_in_leaf = 50

# minimal sum hessians for one leaf, use this to deal with over-fit
# minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0

# save memory and faster speed for sparse feature, alias: is_sparse
Expand Down
6 changes: 3 additions & 3 deletions examples/python-guide/advanced_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
# can predict with any iteration when loaded in pickle way
y_pred = pkl_bst.predict(X_test, num_iteration=7)
# eval with loaded model
print("The rmse of pickled model's prediction is:", mean_squared_error(y_test, y_pred) ** 0.5)
print("The RMSE of pickled model's prediction is:", mean_squared_error(y_test, y_pred) ** 0.5)

# continue training
# init_model accepts:
Expand Down Expand Up @@ -146,7 +146,7 @@ def loglikelihood(preds, train_data):
# f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool
# binary error
# NOTE: when you do customized loss function, the default prediction value is margin
# This may make built-in evalution metric calculate wrong results
# This may make built-in evaluation metric calculate wrong results
# For example, we are doing log likelihood loss, the prediction is score before logistic transformation
# Keep this in mind when you use the customization
def binary_error(preds, train_data):
Expand All @@ -170,7 +170,7 @@ def binary_error(preds, train_data):
# f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool
# accuracy
# NOTE: when you do customized loss function, the default prediction value is margin
# This may make built-in evalution metric calculate wrong results
# This may make built-in evaluation metric calculate wrong results
# For example, we are doing log likelihood loss, the prediction is score before logistic transformation
# Keep this in mind when you use the customization
def accuracy(preds, train_data):
Expand Down
Loading