-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] CUDA serialization of cuML estimators for UCX transport #1732
Comments
This seems to be very doable in the Base class, which would combine well changing our memory structures to cuML Array which has the serializable bits as well. That would cover most classes except for a few like RandomForest that have a few extra things (the treelite model). Thoughts? |
We may have already talked about this offline. So apologies if I'm just repeating myself, but one option here would be to use I recall a similar discussion about how to serialize RandomForest in scikit-learn maybe 2 years ago (could be misremembering though). Not sure if the same decisions apply here. If so, maybe @TomAugspurger or @mrocklin would be able to weigh in 😉 |
@jakirkham, indeed I did miss the reference to “register_generic” in our earlier conversation. That approach is 100% what I’ve been referring to. That works out perfectly, then. |
No worries. We covered a lot of ground. I've also lost track 😄 We might need to tweak Looks like this may actually be the answer to serializing CuPy sparse matrices as well 😉 |
Is this a serializer in dask/distributed? Generally, we shouldn't be doing copies to host, if we are this is probably a bug. Also note that if you're using UCX but not enabling NVLink explicitly, then transfers will indeed incur copying to host. |
The point is the cuML's models themselves are not currently serializable with Dask. So Dask tried to use the This isn't really a bug as there is no way to know how to serialize cuML models a priori ( Hopefully that makes more sense 🙂 |
It does make more sense, thanks @jakirkham for the details! |
Have added PR ( dask/distributed#3536 ), which extends Edit: Can follow-up with a subsequent PR to apply this to CuPy sparse matrices. Should also be a useful template to follow for cuML models. 😉 |
@jakirkham or @cjnolet can you elaborate on why building a custom serializer on the cuML side for estimators is hard ? Why can't cuML models be serialized a priori ? |
Do you mean adding My guess is it is not hard, but it will be a lot of boiler plate (both to write and maintain). It also likely won't be too different from just using FWIW it seems that |
Yes, exactly like a serialize method in cuDF. I think adding in dask/distributed#3536 makes sense -- mostly i wanted to make sure i understood things and help anyone else following along |
Sparse matrix serialization is now being handled in PR ( dask/distributed#3545 ). Should make sparse matrix serialization more efficient for cuML. Also should serve as a template on how to serialize cuML models using |
I spent some time today benchmarking some performance bottlenecks in our distributed algorithms when UCX is enabled in Dask. I was originally taken aback by the results- it appears train/predict can be much slower when using UCX than with TCP. After some further digging, I found the transfers of estimators between client and workers to be the main source of the bottleneck.
What I've since realized is that the cuML estimators will be serialized using pickle for UCX transport, which appears to guarantee a copy to host (@jakirkham please correct me if this is incorrect).
A common pattern that we are using more and more for cuML model estimators is to train a set of parameters on the workers and bring those parameters back to the client to construct a single-GPU estimator which can then be broadcast to the relevant workers for embarrassingly parallel prediction. This pattern greatly simplifies the design and maintenance of our code, but there are other options. I think it's worth having a discussion about it so we can maintain consistency while keeping the code clean and maintainable.
I see a couple options. Please feel free to propose more:
We can find a good way to handle serialization of estimators in the abstract- perhaps by introspecting the object's
__dict__
and imposing some conventions on which types of instance variables should be serialized.Rather than passing the models around in the Dask layer, we just pass around the relevant parameters / instance variables as dictionaries or tuples.
My personal choice would be option #1. #2 seems like a means to achieving #1 and .I believe it would still require the workers to transfer enough of the model's state to re-construct the estimator for prediction.
cc. @JohnZed @dantegd
The text was updated successfully, but these errors were encountered: