Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Fix early stop with xgboost4j-spark #4176

Merged
merged 6 commits into from
Mar 1, 2019
Merged

[jvm-packages] Fix early stop with xgboost4j-spark #4176

merged 6 commits into from
Mar 1, 2019

Conversation

yanboliang
Copy link
Contributor

Two major take aways:

  • Fix the early stop condition: training stops if the current iteration is earlyStoppingSteps away from the best iteration.
  • If there are multiple watches, only the last one is used to determinate early stop.

@CodingCat
Copy link
Member

unit test failed?

@mingyang
Copy link

@yanboliang Thanks for the fix!
@CodingCat Will this fix in the java also fix the early stopping in xgboost4j-spark? Is this included in the latest 0.82-SNAPSHOT jar that I can test?

@CodingCat
Copy link
Member

yes, it will also fix the problem in spark, as it directly calls that in java

I am not sure about 0.82-SNAPSHOT, depends on whether we get it merged before the other blocking issue is resolved

@CodingCat CodingCat changed the title Fix early stop with xgboost4j-spark [jvm-packages] Fix early stop with xgboost4j-spark Feb 26, 2019
@codecov-io
Copy link

codecov-io commented Feb 26, 2019

Codecov Report

Merging #4176 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4176      +/-   ##
==========================================
+ Coverage   63.69%   63.69%   +<.01%     
==========================================
  Files         131      131              
  Lines       12069    12238     +169     
==========================================
+ Hits         7687     7795     +108     
- Misses       4382     4443      +61
Impacted Files Coverage Δ
python-package/xgboost/core.py 75.37% <0%> (-3.08%) ⬇️
src/objective/multiclass_obj.cu 92.63% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b7405f...bcc346d. Read the comment docs.

To use -Float.MAX_VALUE as the lower bound, in case there is positive metric.
Copy link
Member

@CodingCat CodingCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -150,6 +152,12 @@ public static Booster train(

evalNames = names.toArray(new String[names.size()]);
evalMats = mats.toArray(new DMatrix[mats.size()]);
if (isMaximizeEvaluation(params)) {
bestScore = -Float.MAX_VALUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference with Float.MIN_VALUE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodingCat Float.MIN_VALUE is the smallest positive nonzero value, whereas -Float.MAX_VALUE is the smallest floating-point number overall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah...interesting.....in scala they are the same

scala> Float.MinValue == -Float.MaxValue
res3: Boolean = true

// to determinate early stop.
float score = metricsOut[metricsOut.length - 1];
if (isMaximizeEvaluation(params)) {
if (score >= bestScore) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should update bestIteration and bestScore for '=' case here, since we prefer stop as early as possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked python impl :

if (maximize_score and score > best_score) or \
(not maximize_score and score < best_score):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

bestIteration = iter;
}
} else {
if (score <= bestScore) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@CodingCat
Copy link
Member

CodingCat commented Mar 1, 2019

@hcho3
Copy link
Collaborator

hcho3 commented Mar 1, 2019

@CodingCat Is this blocking 0.82 release?

@CodingCat
Copy link
Member

@hcho3 technically no, but good to have

@hcho3
Copy link
Collaborator

hcho3 commented Mar 1, 2019

@CodingCat Looks like it's almost there. Let's include it

@CodingCat CodingCat merged commit 9fefa21 into dmlc:master Mar 1, 2019
@yanboliang yanboliang deleted the fix-early-stop branch March 1, 2019 21:53
@lock lock bot locked as resolved and limited conversation to collaborators May 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants