-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] Fix early stop with xgboost4j-spark #4176
Conversation
unit test failed? |
@yanboliang Thanks for the fix! |
yes, it will also fix the problem in spark, as it directly calls that in java I am not sure about 0.82-SNAPSHOT, depends on whether we get it merged before the other blocking issue is resolved |
Codecov Report
@@ Coverage Diff @@
## master #4176 +/- ##
==========================================
+ Coverage 63.69% 63.69% +<.01%
==========================================
Files 131 131
Lines 12069 12238 +169
==========================================
+ Hits 7687 7795 +108
- Misses 4382 4443 +61
Continue to review full report at Codecov.
|
To use -Float.MAX_VALUE as the lower bound, in case there is positive metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may also update the doc in https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#early-stopping
@@ -150,6 +152,12 @@ public static Booster train( | |||
|
|||
evalNames = names.toArray(new String[names.size()]); | |||
evalMats = mats.toArray(new DMatrix[mats.size()]); | |||
if (isMaximizeEvaluation(params)) { | |||
bestScore = -Float.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference with Float.MIN_VALUE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodingCat Float.MIN_VALUE
is the smallest positive nonzero value, whereas -Float.MAX_VALUE
is the smallest floating-point number overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah...interesting.....in scala they are the same
scala> Float.MinValue == -Float.MaxValue
res3: Boolean = true
// to determinate early stop. | ||
float score = metricsOut[metricsOut.length - 1]; | ||
if (isMaximizeEvaluation(params)) { | ||
if (score >= bestScore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should update bestIteration and bestScore for '=' case here, since we prefer stop as early as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked python impl :
xgboost/python-package/xgboost/callback.py
Lines 233 to 234 in 7d3149a
if (maximize_score and score > best_score) or \ | |
(not maximize_score and score < best_score): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point!
bestIteration = iter; | ||
} | ||
} else { | ||
if (score <= bestScore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
can you also update the doc at |
@CodingCat Is this blocking 0.82 release? |
@hcho3 technically no, but good to have |
@CodingCat Looks like it's almost there. Let's include it |
Two major take aways:
earlyStoppingSteps
away from the best iteration.watches
, only the last one is used to determinate early stop.