Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

early stopping doesn't work in Spark package #6657

Closed
sammynammari opened this issue Jan 28, 2021 · 3 comments
Closed

early stopping doesn't work in Spark package #6657

sammynammari opened this issue Jan 28, 2021 · 3 comments

Comments

@sammynammari
Copy link

I believe that early stopping is broken in the Spark package. Here is a minimal example:

scala> val data = sc.parallelize(List(1.0 -> Vectors.dense(Array(1.0)))).toDF("label", "features")
data: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> val model = new XGBoostClassifier(Map[String, Any]()).setNumEarlyStoppingRounds(10).fit(data)
java.lang.IllegalArgumentException: custom_eval does not support early stopping
  at ml.dmlc.xgboost4j.scala.spark.XGBoostExecutionParamsFactory.overrideParams(XGBoost.scala:150)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostExecutionParamsFactory.<init>(XGBoost.scala:96)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:535)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:190)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
  ... 49 elided

However, I am not setting custom_eval in my params. Checking the logs, these are the parameters that XGBoost is running with (clipped for brevity):

21/01/28 21:04:57 INFO XGBoostSpark: Running XGBoost 1.1.2 with parameters:
num_early_stopping_rounds -> 10
custom_eval -> null

So, it seems that internally, XGBoost is setting custom_eval to null, and then later checking to see if the key exists. But, it always exists, since it is set with a default value of null.

I believe the correct behavior should be to not set a default value, or modify the check appropriately.

I am using version 1.1.2 and Spark 2.4.0.

@r-luo
Copy link

r-luo commented Jan 29, 2021

I just encountered this issue yesterday and found a workaround:

This line means that if you have "maximize_evaluation_metrics" set then it won't check for whether the custom_eval key exists:
https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/XGBoost.scala#L151

So if you add .setMaximizeEvaluationMetrics(maximizeEvaluationMetrics) to your model then it works. maximizeEvaluationMetrics should be a boolean value

@sammynammari
Copy link
Author

sammynammari commented Feb 5, 2021

any response from the maintainers? not being able to use early stopping is a significant bug

@trivialfis
Copy link
Member

Would you like to open a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants