-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XGBoost4J-Spark] Problem for (de)serialization of Spark implementation of custom objective and eval #7274
Conversation
Thanks for the PR! @wbo4958 Could you please take a look when you are available? |
Looking forward to getting some feedback.
Also, I'm just realizing that my implementation does not add type hints
when loading is the first thing that happens (now the type hints are added
when instantiating the custom obj/eval for the first time). I guess one
could make sure one always creates an instance before loading, but there
must be a better way. I would appreciate any suggestions. Thanks!
|
@nicovdijk What issue does your PR try to solve? Could you add more descriptions? |
@wbo4958: This PR is a solution to issue #7224 with the same name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicovdijk I just figured out what your PR tries to fix. Overall, You PR can fix this issue. But it's not straight forward for users who would like to customize their obj/eval, it requires users explicitly set the type hint.
Could we have callback way to do it by xgboost?
...packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala
Outdated
Show resolved
Hide resolved
import ml.dmlc.xgboost4j.scala.{DMatrix, ObjectiveTrait} | ||
import ml.dmlc.xgboost4j.scala.spark.params.CustomObjParam._ | ||
import org.apache.commons.logging.LogFactory | ||
import org.json4s.ShortTypeHints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used. better to remove it
@wbo4958, I would like to automate it, yes, but I'm not sure what is the best way.
Also:
PS The comments have been fixed (I don't seem to have a proper linter). |
XGBoost2MLlibParams https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/GeneralParams.scala#L260 will check and set the param, Could we add some matches for CustomObjParam and CustomEvalParam and set the TypeHints here? |
That works nicely, thanks for the suggestion. The only problem left is that we need to make sure that the type hint gets added, also before creating an instance for the first time (now it gets added when we save first). Any suggestions? Maybe one would like to create a dummy instance somewhere when creating the class itself. However, I don't really how this should work for someone using XGBoost.. |
We could add define an Interface (Trait) for this |
I'm not sure I understand.. Ideally we should register the type hint before instantiation. However, it seems to me that we cannot do that for someone using XGBoost. They would have to do that themselves. We would like to do the registration when defining the class (maybe in the companion object?). In python, for instance, you can run some code when importing a module, but in Scala I'm not sure how to accomplish this (objects are lazy). I don't know how a Trait can help me with this. Can you explain more? The following code demonstrates the issue (I think): import org.scalatest.FunSuite
class MyTest extends FunSuite {
trait TestTrait {}
object TestTrait {
private var typeHintsAdded = Set[String]()
def addTypeHint(instance: Any): Unit = {
val clazz = instance.getClass()
val className = clazz.getSimpleName()
if (!typeHintsAdded.contains(className)) {
typeHintsAdded += className
}
}
def printTypeHints() {
println(s"Type hints:\n$typeHintsAdded")
}
}
class TestClass extends TestTrait {
TestTrait.addTypeHint(this)
}
object TestClass extends TestTrait {
}
test("testing adding type hint while defining class") {
println("Before creating instance\n")
TestTrait.printTypeHints()
val testClass = new TestClass
println("After creating instance\n")
TestTrait.printTypeHints()
}
} |
If users customize their obj, then they must extends ObjectiveTrait, So what am I think is whether we can add another trait like ABC.scala where define some methods to add TypeHints, and let ObjectiveTrait extends ABC. Then when users try to customize, they will be forced to implement the method adding TypeHint. |
I would like a general TypeHint adding trait. However, modifying ObjectiveTrait in that way would also force users that do not use Spark to implement that method... Is that a good idea? Also, I still don't see how that solves the problem about adding type hints before instantiation of the corresponding class. Sorry if I'm slow.. |
Please correct me. |
However, the downside of adding TypeHints in XGBoost2MLlibParams is that creating an instance of the CustomObj or CustomEval is no longer enough. One would need to use them as parameters in an XGBoostRegressor or XGBoostClassifier... I think it's better to execute the logic upon instantiation of a custom objective or eval. |
Hi @nicovdijk, Personally, this still seems like a workaround instead of a solution. I will try to make a commit locally to see if my suggestion can work. |
@wbo4958, I'm still not really sure what you mean, so I would like to see any suggestion you have, thanks! By the way, I did not include the type hint trait in the objective trait, because that's in the xgboost4j package which does not serialize the parameters, if I'm correct. I don't think it has the json4s dependency either. But that's definitely possible, if that's what you mean. Is it? |
@nicovdijk Thx for your feed back. Yeah, I just realize the ObjectiveTrait is only in XGBoost4j. BTW, Have you tested the scenario just load the model with customized obj/eval without any other train process? |
@wbo4958, I haven't tested it using XGBoost (the tests create temporary files, to do this test we would need a saved model as resource), but the script I provided earlier demonstrates the issue. So I'm pretty sure that it would fail at the moment. The user would have to create temporary instances of all custom obj/evals when starting the program, then everything works. Unless we have some initialization logic where we could do this for the user. |
Ok, Could you refine your PR, then we can review it |
@wbo4958 Sorry, what refinements exactly would you like? |
...kages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/AddTypeHints.scala
Outdated
Show resolved
Hide resolved
...kages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/AddTypeHints.scala
Outdated
Show resolved
Hide resolved
...kages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/CustomParams.scala
Outdated
Show resolved
Hide resolved
...packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala
Outdated
Show resolved
Hide resolved
...packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #7274 +/- ##
=======================================
Coverage 83.68% 83.68%
=======================================
Files 13 13
Lines 3885 3885
=======================================
Hits 3251 3251
Misses 634 634 Continue to review full report at Codecov.
|
@wbo4958, I refined 😄 the PR. There is one last thing I would like to add. We use the Spark API in a Java project. Using traits with implementations is a bit cumbersome then. Can we add an abstract class |
In that case, SparkCustomObjective will break the public API. Anyway, this PR looks good for now. We can merge it for now and we will refine this once we figured out a better way for this. Thx @nicovdijk |
@trivialfis Please help to review it, and please change the [XGBoost-spark] to [jvm-packages] |
Just to be clear: I would add a Scala abstract class that would be easier to use in Java code, nothing more. On the other hand, I guess there is a work-around by manually using the |
You mean xgboost4j introduces the trait from xgboost4j-spark? |
No, I only want to add the following to xgboost4j-spark: abstract class SparkCustomEval extends EvalTrait with TypeHintsTrait
abstract class SparkCustomObjective extends ObjectiveTrait with TypeHintsTrait Then in Java, you use |
@wbo4958, I think I did what you wanted. It was the first time I used rebasing, so please check if the result is correct, thanks. |
jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/Utils.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@trivialfis Could you help to review it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. Thanks for the nice work!
@trivialfis @nicovdijk Hi I'm a user of XGBoost and get caught by this issue for a long time. Just as @wbo4958 said, is it possible for XGBoost to solve it so that XGboostModel can be integrated as a |
@abugh, I believe this merged PR does exactly that. Is this not the case? |
@abugh, Yeah, Please try the newest release |
@nicovdijk But I see that in the How about if we transfer XGBoostClassificationModel into a stage of Spark PipelineModel, just like this |
Now I'm using XGBoost-Spark 1.3.1, with scala 2.1.2 . When I'm using XGBoostRegressionModel with Customized Objectives, I can't even load the model with XGBoostRegressionModel.load() and (I used XGBoostRegressionModel.save() to save the model). |
And I see that in the latest version(1.5.1), the unit-test about customized objectives is replaced by using integrated objective. |
@abugh, this PR is not included in the 1.5.0 release or the patch release 1.5.1. It is included in the master branch. I do not know why it's not included yet nor when it will be. I have even tested saving and loading in a pipeline in master - it works (which is expected, since it's just using the save and load implementation of the xgboost stage). |
@nicovdijk Really thanks for your applying. So do you mean that loading in a pipeline is same to loading in XGBoostRegressionModel? Or else, If loading in XGBoostRegressionModel works well, loading in a pipeline should also work well?
|
Oh! good news. I change to use 1.5.1 release patch, seems it works now when using loading in XGBoostRegressionModel(loading in pipelineMode isn't tested yet, but I guess it works.) I don't know the reason, as you mentioned that 1.5.1 release patch doesn't admit this PR. But I just share this information to you. |
@trivialfis, do you know if/when the code in this PR gets released? |
And also could you tell me which spark version is needed when using xgboost4j-spark_1.51 with Scala 2.12? |
Unfortunately, seems that the successful run was a coincidence. I can't recurrent it even with the same conditions, and not know the reason. It would be nice if you can tell me the plan/schedule to release this PR in the release patch. I will wait for it. |
Suggested solution to the issue in the title.