Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] xgboost4j-spark XGBoostClassifier cannot assign 0 to max_depth #4038

Closed
SpringHerald opened this issue Jan 3, 2019 · 7 comments

Comments

@SpringHerald
Copy link

The xgboost spark API said "maximum depth of a tree, increase this value will make model more complex / likely to be overfitting. [default=6] range: [1, Int.MaxValue]".
But xgboost4j-spark tutorial said "In XGBoost4J-Spark, we support not only the default set of parameters but also the camel-case variant of these parameters to keep consistent with Spark’s MLLIB parameters."
And XGBoost Parameters notified that max_depth=0 indicates no limit.
So maybe it's better to enable max_depth=0 in xgboost4j-spark?

Exception in thread "main" java.lang.IllegalArgumentException: xgbc_8554b906dd6c parameter maxDepth given invalid value 0.
	at org.apache.spark.ml.param.Param.validate(params.scala:77)
	at org.apache.spark.ml.param.ParamPair.<init>(params.scala:528)
	at org.apache.spark.ml.param.Param.$minus$greater(params.scala:87)
	at org.apache.spark.ml.param.Params$class.set(params.scala:609)
	at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:42)
	at org.apache.spark.ml.param.Params$class.set(params.scala:616)
	at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:42)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$$anonfun$XGBoostToMLlibParams$2.apply(GeneralParams.scala:240)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$$anonfun$XGBoostToMLlibParams$2.apply(GeneralParams.scala:225)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
	at scala.collection.immutable.Map$Map4.foreach(Map.scala:188)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$class.XGBoostToMLlibParams(GeneralParams.scala:225)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.XGBoostToMLlibParams(XGBoostClassifier.scala:48)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.<init>(XGBoostClassifier.scala:61)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.<init>(XGBoostClassifier.scala:58)
	at cn.com.bsfit.dm.components.algo.XGBoostTrial$.main(XGBoostTrial.scala:40)
	at cn.com.bsfit.dm.components.algo.XGBoostTrial.main(XGBoostTrial.scala)
@CodingCat
Copy link
Member

0 is not a valid value for the current growing policy adopted in spark case, though it would be valid if you use fast histogram in future #4011

@hcho3 any insights on why we said 0 means no limit in the doc? I think it's only for fast histogram?

@CodingCat
Copy link
Member

in future, the usage discussion should be to https://discuss.xgboost.ai/

@hcho3
Copy link
Collaborator

hcho3 commented Jan 6, 2019

@CodingCat The doc says

Note that limit is required when grow_policy is set of depthwise

@CodingCat
Copy link
Member

isn't this incorrect? https://github.com/dmlc/xgboost/blob/master/src/tree/updater_histmaker.cc#L136

for approx, setting to zero will have an empty tree....

this no limit thing looks like only applied to loss-guide growing in fast histogram

@hcho3
Copy link
Collaborator

hcho3 commented Jan 6, 2019

Yes, it looks like the no-limit condition is only applicable to the fast histogram.

@rongou
Copy link
Contributor

rongou commented Jan 9, 2019

You can also use no-limit and lossguide with gpu_hist, which I'm trying to add support for. Would it be possible to relax the check to >=0?

@CodingCat
Copy link
Member

@rongou here https://github.com/dmlc/xgboost/pull/4011/files#diff-00dbd05cd0d409bdeac5777a2636afa6R53, when fast histogram is supported in distributed case

@lock lock bot locked as resolved and limited conversation to collaborators Apr 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants