[FEA] CUDA serialization of cuML estimators for UCX transport #1732

cjnolet · 2020-02-21T23:13:24Z

I spent some time today benchmarking some performance bottlenecks in our distributed algorithms when UCX is enabled in Dask. I was originally taken aback by the results- it appears train/predict can be much slower when using UCX than with TCP. After some further digging, I found the transfers of estimators between client and workers to be the main source of the bottleneck.

What I've since realized is that the cuML estimators will be serialized using pickle for UCX transport, which appears to guarantee a copy to host (@jakirkham please correct me if this is incorrect).

A common pattern that we are using more and more for cuML model estimators is to train a set of parameters on the workers and bring those parameters back to the client to construct a single-GPU estimator which can then be broadcast to the relevant workers for embarrassingly parallel prediction. This pattern greatly simplifies the design and maintenance of our code, but there are other options. I think it's worth having a discussion about it so we can maintain consistency while keeping the code clean and maintainable.

I see a couple options. Please feel free to propose more:

We can find a good way to handle serialization of estimators in the abstract- perhaps by introspecting the object's __dict__ and imposing some conventions on which types of instance variables should be serialized.
Rather than passing the models around in the Dask layer, we just pass around the relevant parameters / instance variables as dictionaries or tuples.

My personal choice would be option #1. #2 seems like a means to achieving #1 and .I believe it would still require the workers to transfer enough of the model's state to re-construct the estimator for prediction.

cc. @JohnZed @dantegd

The text was updated successfully, but these errors were encountered:

dantegd · 2020-02-22T02:28:26Z

We can find a good way to handle serialization of estimators in the abstract- perhaps by introspecting the object's __dict__ and imposing some conventions on which types of instance variables should be serialized.

This seems to be very doable in the Base class, which would combine well changing our memory structures to cuML Array which has the serializable bits as well. That would cover most classes except for a few like RandomForest that have a few extra things (the treelite model). Thoughts?

jakirkham · 2020-02-22T02:34:22Z

We may have already talked about this offline. So apologies if I'm just repeating myself, but one option here would be to use register_generic from Distributed to walk through __dict__ and serialize/deserialize everything it finds. This could work well for the case you have described 🙂

I recall a similar discussion about how to serialize RandomForest in scikit-learn maybe 2 years ago (could be misremembering though). Not sure if the same decisions apply here. If so, maybe @TomAugspurger or @mrocklin would be able to weigh in 😉

cjnolet · 2020-02-22T02:53:10Z

@jakirkham, indeed I did miss the reference to “register_generic” in our earlier conversation. That approach is 100% what I’ve been referring to. That works out perfectly, then.

jakirkham · 2020-02-22T03:17:15Z

No worries. We covered a lot of ground. I've also lost track 😄

We might need to tweak register_generic a little bit (it needs a little logic to handle CUDA objects), but I think this is actually pretty manageable (after looking at the code).

Looks like this may actually be the answer to serializing CuPy sparse matrices as well 😉

pentschev · 2020-02-24T17:25:03Z

What I've since realized is that the cuML estimators will be serialized using pickle for UCX transport, which appears to guarantee a copy to host (@jakirkham please correct me if this is incorrect).

Is this a serializer in dask/distributed? Generally, we shouldn't be doing copies to host, if we are this is probably a bug.

Also note that if you're using UCX but not enabling NVLink explicitly, then transfers will indeed incur copying to host.

jakirkham · 2020-02-24T18:08:12Z

The point is the cuML's models themselves are not currently serializable with Dask. So Dask tried to use the "cuda" and/or "dask" serializers, which fails. Then it falls back to "pickle" since there is nothing better to do.

This isn't really a bug as there is no way to know how to serialize cuML models a priori ("pickle" is a reasonable choice of last resort). Really we need to cuML to fill in some of these details for us. The rest of the discussion above is just about how we should do this.

Hopefully that makes more sense 🙂

pentschev · 2020-02-24T18:14:47Z

It does make more sense, thanks @jakirkham for the details!

jakirkham · 2020-03-01T05:30:15Z

Have added PR ( dask/distributed#3536 ), which extends register_generic to allow for the "cuda" protocol to be used on objects that need it.

Edit: Can follow-up with a subsequent PR to apply this to CuPy sparse matrices. Should also be a useful template to follow for cuML models. 😉

quasiben · 2020-03-02T14:47:08Z

@jakirkham or @cjnolet can you elaborate on why building a custom serializer on the cuML side for estimators is hard ? Why can't cuML models be serialized a priori ?

jakirkham · 2020-03-02T21:07:21Z

Do you mean adding .serialize(...) methods like in cuDF?

My guess is it is not hard, but it will be a lot of boiler plate (both to write and maintain). It also likely won't be too different from just using register_generic on cuML Base. We may even decide to go back through cuDF and use the same solution for simplicity.

FWIW it seems that register_generic came about to solve a similar problem for scikit-learn (as evidenced by the docs).

quasiben · 2020-03-02T22:19:11Z

Yes, exactly like a serialize method in cuDF. I think adding in dask/distributed#3536 makes sense -- mostly i wanted to make sure i understood things and help anyone else following along

jakirkham · 2020-03-03T22:06:11Z

Sparse matrix serialization is now being handled in PR ( dask/distributed#3545 ). Should make sparse matrix serialization more efficient for cuML. Also should serve as a template on how to serialize cuML models using register_generic.

cjnolet added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 21, 2020

jakirkham mentioned this issue Mar 1, 2020

Support using other serializers with register_generic dask/distributed#3536

Merged

jakirkham mentioned this issue Mar 4, 2020

[FEA] Pickle serialization of multi-GPU models #693

Closed

cjnolet closed this as completed Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

cjnolet commented Feb 21, 2020

dantegd commented Feb 22, 2020

jakirkham commented Feb 22, 2020

cjnolet commented Feb 22, 2020 •

edited

Loading

jakirkham commented Feb 22, 2020

pentschev commented Feb 24, 2020

jakirkham commented Feb 24, 2020

pentschev commented Feb 24, 2020

jakirkham commented Mar 1, 2020 •

edited

Loading

quasiben commented Mar 2, 2020

jakirkham commented Mar 2, 2020

quasiben commented Mar 2, 2020

jakirkham commented Mar 3, 2020

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

[FEA] CUDA serialization of cuML estimators for UCX transport #1732

Comments

cjnolet commented Feb 21, 2020

dantegd commented Feb 22, 2020

jakirkham commented Feb 22, 2020

cjnolet commented Feb 22, 2020 • edited Loading

jakirkham commented Feb 22, 2020

pentschev commented Feb 24, 2020

jakirkham commented Feb 24, 2020

pentschev commented Feb 24, 2020

jakirkham commented Mar 1, 2020 • edited Loading

quasiben commented Mar 2, 2020

jakirkham commented Mar 2, 2020

quasiben commented Mar 2, 2020

jakirkham commented Mar 3, 2020

cjnolet commented Feb 22, 2020 •

edited

Loading

jakirkham commented Mar 1, 2020 •

edited

Loading