-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] eval_set for xgboost4j-spark #3231
Comments
I think there is a comment when bring the code in, #2710 (comment) Would you like to give this requirement a shot? |
All feature requests are now consolidated to #3439. This issue should be re-opened if someone decides to actively work on implementing this feature. |
I will work on eval set this week |
@CodingCat There is a work in progress to implement watchlist in the XGBoost4J Scala wrapper: #3544. Can we take advantage of this to implement watchlist in XGBoost4J-Spark? |
spark's problem is you have to find some way to pass in, join (or zip), multiple dataframes and pass some part of each of them to each Spark task, create DMatrix, and take each DMatrix in each Spark task as each watch dataset..... that part is kind of complicated and needs to refactor the current |
Consolidating to the feature request tracker #3439. Feel free to re-open this issue when anyone starts working on this. |
the feature is implemented in #3910 |
Hi @CodingCat, I need to define a separate validation set for cross validation, using xgboost4j on spark. I tried the approach here. It does not look like that setting
|
There is no way to set custom evaluation set for
ml.dmlc.xgboost4j.scala.spark.XGBoost#trainDistributed
. Code inside uses privateml.dmlc.xgboost4j.scala.spark.Watches
class which just splits train with predefined trainTestRatio and doesn't accept any custom eval set through params.Is there any particular reason for this limitation or it's just stub and can be extended for example with DMatrix passed through params? Is there any complications caused by fact that this is distributed XGBoost? How such dataset should be stored in params then, as DMatrix or RDD, or something else?
The text was updated successfully, but these errors were encountered: