Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

Closed
cjnolet opened this issue Feb 21, 2020 · 12 comments
Closed

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

cjnolet opened this issue Feb 21, 2020 · 12 comments
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@cjnolet
Copy link
Member

cjnolet commented Feb 21, 2020

I spent some time today benchmarking some performance bottlenecks in our distributed algorithms when UCX is enabled in Dask. I was originally taken aback by the results- it appears train/predict can be much slower when using UCX than with TCP. After some further digging, I found the transfers of estimators between client and workers to be the main source of the bottleneck.

What I've since realized is that the cuML estimators will be serialized using pickle for UCX transport, which appears to guarantee a copy to host (@jakirkham please correct me if this is incorrect).

A common pattern that we are using more and more for cuML model estimators is to train a set of parameters on the workers and bring those parameters back to the client to construct a single-GPU estimator which can then be broadcast to the relevant workers for embarrassingly parallel prediction. This pattern greatly simplifies the design and maintenance of our code, but there are other options. I think it's worth having a discussion about it so we can maintain consistency while keeping the code clean and maintainable.

I see a couple options. Please feel free to propose more:

  1. We can find a good way to handle serialization of estimators in the abstract- perhaps by introspecting the object's __dict__ and imposing some conventions on which types of instance variables should be serialized.

  2. Rather than passing the models around in the Dask layer, we just pass around the relevant parameters / instance variables as dictionaries or tuples.

My personal choice would be option #1. #2 seems like a means to achieving #1 and .I believe it would still require the workers to transfer enough of the model's state to re-construct the estimator for prediction.

cc. @JohnZed @dantegd

@cjnolet cjnolet added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 21, 2020
@dantegd
Copy link
Member

dantegd commented Feb 22, 2020

We can find a good way to handle serialization of estimators in the abstract- perhaps by introspecting the object's __dict__ and imposing some conventions on which types of instance variables should be serialized.

This seems to be very doable in the Base class, which would combine well changing our memory structures to cuML Array which has the serializable bits as well. That would cover most classes except for a few like RandomForest that have a few extra things (the treelite model). Thoughts?

@jakirkham
Copy link
Member

We may have already talked about this offline. So apologies if I'm just repeating myself, but one option here would be to use register_generic from Distributed to walk through __dict__ and serialize/deserialize everything it finds. This could work well for the case you have described 🙂

I recall a similar discussion about how to serialize RandomForest in scikit-learn maybe 2 years ago (could be misremembering though). Not sure if the same decisions apply here. If so, maybe @TomAugspurger or @mrocklin would be able to weigh in 😉

@cjnolet
Copy link
Member Author

cjnolet commented Feb 22, 2020

@jakirkham, indeed I did miss the reference to “register_generic” in our earlier conversation. That approach is 100% what I’ve been referring to. That works out perfectly, then.

@jakirkham
Copy link
Member

No worries. We covered a lot of ground. I've also lost track 😄

We might need to tweak register_generic a little bit (it needs a little logic to handle CUDA objects), but I think this is actually pretty manageable (after looking at the code).

Looks like this may actually be the answer to serializing CuPy sparse matrices as well 😉

@pentschev
Copy link
Member

What I've since realized is that the cuML estimators will be serialized using pickle for UCX transport, which appears to guarantee a copy to host (@jakirkham please correct me if this is incorrect).

Is this a serializer in dask/distributed? Generally, we shouldn't be doing copies to host, if we are this is probably a bug.

Also note that if you're using UCX but not enabling NVLink explicitly, then transfers will indeed incur copying to host.

@jakirkham
Copy link
Member

The point is the cuML's models themselves are not currently serializable with Dask. So Dask tried to use the "cuda" and/or "dask" serializers, which fails. Then it falls back to "pickle" since there is nothing better to do.

This isn't really a bug as there is no way to know how to serialize cuML models a priori ("pickle" is a reasonable choice of last resort). Really we need to cuML to fill in some of these details for us. The rest of the discussion above is just about how we should do this.

Hopefully that makes more sense 🙂

@pentschev
Copy link
Member

It does make more sense, thanks @jakirkham for the details!

@jakirkham
Copy link
Member

jakirkham commented Mar 1, 2020

Have added PR ( dask/distributed#3536 ), which extends register_generic to allow for the "cuda" protocol to be used on objects that need it.

Edit: Can follow-up with a subsequent PR to apply this to CuPy sparse matrices. Should also be a useful template to follow for cuML models. 😉

@quasiben
Copy link
Member

quasiben commented Mar 2, 2020

@jakirkham or @cjnolet can you elaborate on why building a custom serializer on the cuML side for estimators is hard ? Why can't cuML models be serialized a priori ?

@jakirkham
Copy link
Member

Do you mean adding .serialize(...) methods like in cuDF?

My guess is it is not hard, but it will be a lot of boiler plate (both to write and maintain). It also likely won't be too different from just using register_generic on cuML Base. We may even decide to go back through cuDF and use the same solution for simplicity.

FWIW it seems that register_generic came about to solve a similar problem for scikit-learn (as evidenced by the docs).

@quasiben
Copy link
Member

quasiben commented Mar 2, 2020

Yes, exactly like a serialize method in cuDF. I think adding in dask/distributed#3536 makes sense -- mostly i wanted to make sure i understood things and help anyone else following along

@jakirkham
Copy link
Member

Sparse matrix serialization is now being handled in PR ( dask/distributed#3545 ). Should make sparse matrix serialization more efficient for cuML. Also should serve as a template on how to serialize cuML models using register_generic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants