Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages]support multiple validation datasets in Spark #3910

Merged
merged 21 commits into from
Dec 18, 2018

Conversation

CodingCat
Copy link
Member

@CodingCat CodingCat commented Nov 16, 2018

  • converge the current training/test split with multi validation datasets
  • add unit test for multiple validation set
  • add support for ranking training tasks
  • add unit test for ranking training
  • update tutorial
  • fix early stopping feature

@CodingCat CodingCat changed the title [WIP][jvm-packages]support multiple validation datasets in Spark [jvm-packages]support multiple validation datasets in Spark Nov 19, 2018
@CodingCat
Copy link
Member Author

@yanboliang @weitian @superbobry anyone of you can review this?

@CodingCat
Copy link
Member Author

image

it's how example run looks like after the change

@CodingCat CodingCat changed the title [jvm-packages]support multiple validation datasets in Spark [WIP][jvm-packages]support multiple validation datasets in Spark Nov 20, 2018
@CodingCat
Copy link
Member Author

CodingCat commented Nov 20, 2018

find one more thing to fix, early stopping

@CodingCat CodingCat changed the title [WIP][jvm-packages]support multiple validation datasets in Spark [jvm-packages]support multiple validation datasets in Spark Nov 20, 2018
@CodingCat
Copy link
Member Author

@yanboliang take a further look?

@codecov-io
Copy link

codecov-io commented Nov 26, 2018

Codecov Report

Merging #3910 into master will increase coverage by 0.18%.
The diff coverage is 72.02%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3910      +/-   ##
============================================
+ Coverage     56.23%   56.41%   +0.18%     
- Complexity      205      210       +5     
============================================
  Files           185      186       +1     
  Lines         14702    14818     +116     
  Branches        498      527      +29     
============================================
+ Hits           8267     8359      +92     
- Misses         6196     6202       +6     
- Partials        239      257      +18
Impacted Files Coverage Δ Complexity Δ
...lc/xgboost4j/scala/spark/params/CustomParams.scala 84.21% <ø> (+20.21%) 0 <0> (ø) ⬇️
.../xgboost4j/scala/example/spark/SparkTraining.scala 0% <0%> (ø) 0 <0> (ø) ⬇️
.../src/main/java/ml/dmlc/xgboost4j/java/XGBoost.java 85.33% <100%> (+0.29%) 46 <0> (+3) ⬆️
...boost4j/scala/spark/params/NonParamVariables.scala 100% <100%> (ø) 0 <0> (?)
...c/xgboost4j/scala/spark/params/GeneralParams.scala 66.66% <100%> (ø) 0 <0> (ø) ⬇️
...xgboost4j/scala/spark/XGBoostTrainingSummary.scala 35.71% <11.11%> (-27.93%) 2 <1> (ø)
...cala/ml/dmlc/xgboost4j/scala/spark/DataUtils.scala 42.1% <60%> (+19.88%) 0 <0> (ø) ⬇️
.../dmlc/xgboost4j/scala/spark/XGBoostRegressor.scala 63.9% <66.66%> (+1.52%) 18 <0> (+1) ⬆️
...dmlc/xgboost4j/scala/spark/XGBoostClassifier.scala 65.06% <77.77%> (+0.77%) 19 <0> (+1) ⬆️
.../scala/ml/dmlc/xgboost4j/scala/spark/XGBoost.scala 77.12% <79.52%> (-0.38%) 0 <0> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c4ff50...49d578f. Read the comment docs.

@CodingCat
Copy link
Member Author

@yanboliang ping?

@CodingCat
Copy link
Member Author

@yanboliang ping

@yanboliang
Copy link
Contributor

Looks good to me overall, thanks.

@CodingCat
Copy link
Member Author

thanks @yanboliang

@CodingCat CodingCat merged commit c055a32 into dmlc:master Dec 18, 2018
@CodingCat CodingCat deleted the multi_eval branch December 18, 2018 05:04
trait NonParamVariables {
protected var evalSetsMap: Map[String, DataFrame] = Map.empty

def setEvalSets(evalSets: Map[String, DataFrame]): Unit = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is the right place to comment, let me know if it is not correct.

I am a user of xgboost and I am using 0.82 for our production training. One thing I think we can improve is that the return value of this function is better to be this.type
Because as a scala user I will write code like:

val xgb = new XGBoostClassifier(xgboostParam)
  .setFeaturesCol(MSDataSchema.FEATURE_VECTOR)
  .setLabelCol(MSDataSchema.RELEVANCE_LEVEL)
  .setEvalSets(Map("eval" -> testData))

xgb will be a None in this case

@CodingCat
Copy link
Member Author

Oh, it’s a typo or autocomplete by ide, feel free to file a PR addressing this!

@linghaogu
Copy link

I will thank you!

@lock lock bot locked as resolved and limited conversation to collaborators May 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants