[jvm-packages] eval_set for xgboost4j-spark #3231

keanpantraw · 2018-04-09T17:16:43Z

There is no way to set custom evaluation set for ml.dmlc.xgboost4j.scala.spark.XGBoost#trainDistributed. Code inside uses private ml.dmlc.xgboost4j.scala.spark.Watches class which just splits train with predefined trainTestRatio and doesn't accept any custom eval set through params.
Is there any particular reason for this limitation or it's just stub and can be extended for example with DMatrix passed through params? Is there any complications caused by fact that this is distributed XGBoost? How such dataset should be stored in params then, as DMatrix or RDD, or something else?

The text was updated successfully, but these errors were encountered:

This fixes dmlc#3231

CodingCat · 2018-04-10T04:30:32Z

I think there is a comment when bring the code in, #2710 (comment)

Would you like to give this requirement a shot?

hcho3 · 2018-07-04T23:13:00Z

All feature requests are now consolidated to #3439. This issue should be re-opened if someone decides to actively work on implementing this feature.

CodingCat · 2018-07-10T03:20:42Z

I will work on eval set this week

hcho3 · 2018-08-01T19:52:14Z

@CodingCat There is a work in progress to implement watchlist in the XGBoost4J Scala wrapper: #3544. Can we take advantage of this to implement watchlist in XGBoost4J-Spark?

CodingCat · 2018-08-01T21:24:04Z

spark's problem is you have to find some way to pass in, join (or zip), multiple dataframes

and pass some part of each of them to each Spark task, create DMatrix, and take each DMatrix in each Spark task as each watch dataset.....

that part is kind of complicated and needs to refactor the current Watch thing, I think we can do it in the next version.....

hcho3 · 2018-09-07T18:27:56Z

Consolidating to the feature request tracker #3439. Feel free to re-open this issue when anyone starts working on this.

CodingCat · 2019-01-28T18:11:45Z

the feature is implemented in #3910

eliyara · 2019-02-13T21:52:59Z

Hi @CodingCat,

I need to define a separate validation set for cross validation, using xgboost4j on spark. I tried the approach here. It does not look like that setting "eval_sets" -> Map("dev" -> dev_df) make any difference! Should I expect the following set up work as cross validation does (using TrainValidationSplit)?

        val params = scala.collection.mutable.Map(
            "eta" -> 0.1,
            "objective" -> "binary:logistic",
            "eval_sets" -> Map("dev" -> dev_df))
        val booster = new XGBoostClassifier(params.toMap)
        booster.setFeaturesCol("features")
        booster.setLabelCol("label")
        booster.setMaxDepth(5)
        booster.setNumRound(150)
        booster.setNumWorkers(4)
        val xgb_model = booster.fit(train_df)

keanpantraw pushed a commit to keanpantraw/xgboost that referenced this issue Apr 9, 2018

Add ability to specify custom evaluation set

9003a1c

This fixes dmlc#3231

keanpantraw pushed a commit to keanpantraw/xgboost that referenced this issue Apr 10, 2018

Add ability to specify custom evaluation set

91687fc

This fixes dmlc#3231

hcho3 mentioned this issue Jul 4, 2018

Roadmap: feature requests #3439

Open

32 tasks

hcho3 closed this as completed Jul 4, 2018

CodingCat reopened this Jul 10, 2018

CodingCat self-assigned this Jul 10, 2018

hcho3 closed this as completed Sep 7, 2018

hcho3 added the feature-request label Oct 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm-packages] eval_set for xgboost4j-spark #3231

[jvm-packages] eval_set for xgboost4j-spark #3231

keanpantraw commented Apr 9, 2018 •

edited

Loading

CodingCat commented Apr 10, 2018

hcho3 commented Jul 4, 2018 •

edited

Loading

CodingCat commented Jul 10, 2018

hcho3 commented Aug 1, 2018

CodingCat commented Aug 1, 2018

hcho3 commented Sep 7, 2018

CodingCat commented Jan 28, 2019

eliyara commented Feb 13, 2019 •

edited

Loading

[jvm-packages] eval_set for xgboost4j-spark #3231

[jvm-packages] eval_set for xgboost4j-spark #3231

Comments

keanpantraw commented Apr 9, 2018 • edited Loading

CodingCat commented Apr 10, 2018

hcho3 commented Jul 4, 2018 • edited Loading

CodingCat commented Jul 10, 2018

hcho3 commented Aug 1, 2018

CodingCat commented Aug 1, 2018

hcho3 commented Sep 7, 2018

CodingCat commented Jan 28, 2019

eliyara commented Feb 13, 2019 • edited Loading

keanpantraw commented Apr 9, 2018 •

edited

Loading

hcho3 commented Jul 4, 2018 •

edited

Loading

eliyara commented Feb 13, 2019 •

edited

Loading