early stopping doesn't work in Spark package #6657

sammynammari · 2021-01-28T21:50:28Z

I believe that early stopping is broken in the Spark package. Here is a minimal example:

scala> val data = sc.parallelize(List(1.0 -> Vectors.dense(Array(1.0)))).toDF("label", "features")
data: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> val model = new XGBoostClassifier(Map[String, Any]()).setNumEarlyStoppingRounds(10).fit(data)
java.lang.IllegalArgumentException: custom_eval does not support early stopping
  at ml.dmlc.xgboost4j.scala.spark.XGBoostExecutionParamsFactory.overrideParams(XGBoost.scala:150)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostExecutionParamsFactory.<init>(XGBoost.scala:96)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:535)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:190)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
  ... 49 elided

However, I am not setting custom_eval in my params. Checking the logs, these are the parameters that XGBoost is running with (clipped for brevity):

21/01/28 21:04:57 INFO XGBoostSpark: Running XGBoost 1.1.2 with parameters:
num_early_stopping_rounds -> 10
custom_eval -> null

So, it seems that internally, XGBoost is setting custom_eval to null, and then later checking to see if the key exists. But, it always exists, since it is set with a default value of null.

I believe the correct behavior should be to not set a default value, or modify the check appropriately.

I am using version 1.1.2 and Spark 2.4.0.

The text was updated successfully, but these errors were encountered:

r-luo · 2021-01-29T15:23:20Z

I just encountered this issue yesterday and found a workaround:

This line means that if you have "maximize_evaluation_metrics" set then it won't check for whether the custom_eval key exists:
https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/XGBoost.scala#L151

So if you add .setMaximizeEvaluationMetrics(maximizeEvaluationMetrics) to your model then it works. maximizeEvaluationMetrics should be a boolean value

sammynammari · 2021-02-05T23:23:06Z

any response from the maintainers? not being able to use early stopping is a significant bug

trivialfis · 2021-02-06T09:00:16Z

Would you like to open a PR?

trivialfis added the type: bug label Feb 6, 2021

sammynammari mentioned this issue Feb 15, 2021

[jvm-packages] Fix early stopping bug #6708

Closed

wbo4958 mentioned this issue Mar 1, 2021

[jvm-packages] fix early stopping doesn't work even without custom_eval setting #6738

Merged

trivialfis closed this as completed Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

early stopping doesn't work in Spark package #6657

early stopping doesn't work in Spark package #6657

sammynammari commented Jan 28, 2021

r-luo commented Jan 29, 2021

sammynammari commented Feb 5, 2021 •

edited

Loading

trivialfis commented Feb 6, 2021

early stopping doesn't work in Spark package #6657

early stopping doesn't work in Spark package #6657

Comments

sammynammari commented Jan 28, 2021

r-luo commented Jan 29, 2021

sammynammari commented Feb 5, 2021 • edited Loading

trivialfis commented Feb 6, 2021

sammynammari commented Feb 5, 2021 •

edited

Loading