[FEA] `pack`/`unpack` functions to merge/split (multiple) `device_buffer`(s) #9726

jakirkham · 2020-03-01T22:37:33Z

Is your feature request related to a problem? Please describe.

It would be useful to have a pack function to merge multiple device_buffers into a single device_buffer. This is helpful in situations where having one large device_buffer to read from is more performant. However it ultimately consists of many smaller data segments that would need to be merged together. Example use cases include sending data with UCX and spilling data from device to host.

Similarly it would be useful to have an unpack function to split a device_buffer into multiple device_buffers. This is helpful in situations where having one large device_buffer to write into is more performant. However it ultimately consists of many smaller data segments that may need to be freed at different times. Example use cases include receiving data with UCX and unspilling data from host to device.

Describe the solution you'd like

For pack it would be nice if it simply takes several device_buffers in vector form and return a single one. Additionally it would be nice if pack could recognize when device_buffers are contiguous in memory and avoid a copy. Though admittedly this last part is tricky (maybe less so if unpack is used regularly?). If we allow pack to change the order (to benefit from contiguous memory for example), we may want additional information about where the data segments live in the larger device_buffer.

For unpack it would be nice if it takes a single device_buffer and size_ts in vector form to split and return a vector of multiple device_buffers. Additionally it would be nice if unpack did not perform any copies. Hopefully that is straightforward, but there may be things I'm not understanding.

Describe alternatives you've considered

One might consider using variadics in C++ for the arguments. While nice at the C++ level, this seems tricky to use from the Cython and Python levels. Hence the suggestion to just use vector.

pack itself could be implemented by a user simply allocating a larger buffer and copying over. Would be nice to avoid the extra allocation when possible though (which may require knowledge that RMM has about the allocations).

Additional context

Having unpack in particular would be helpful for aggregated receives. A natural extension of this would be to have pack for aggregated sends. All-in-all this should allow transmitting a larger amount of data at once with UCX and thus benefiting from this use case it is more honed for. PR ( dask/distributed#3453 ) provides a WIP implementation of aggregated receives for context.

Also having pack would be useful when spilling several device_buffers from device to host as it would allow us to pack them into one device_buffer before transferring ( rapidsai/dask-cuda#250 ). Having unpack would help us break up the allocation whenever the object is unspilled.

This need has also come up in downstream contexts ( #3793 ). Maybe they would benefit from an upstream solution as well?

The text was updated successfully, but these errors were encountered:

jakirkham · 2020-03-01T22:55:22Z

As to pack, maybe these CUDA Virtual Memory Management APIs (pointed out by Cory in a related context) would be useful?

jrhemstad · 2020-03-02T13:32:30Z

This functionality is kind of outside the scope of RMM. The direction we'd like to go with RMM is really to just be as simple as possible by providing a set of resources, the tools to get/set the default resource, and containers like device_buffer/device_vector that work with resources.

This kind of pack or concatenate functionality is really more the wheelhouse of a consumer of RMM like cuDF.

In cuDF, this is just a concatenate. #4224

kkraus14 · 2020-03-02T16:10:10Z

In cuDF, this is just a concatenate. #4224

That imposes a lot of challenges in order to construct your data in a way to allow using concatenate and then being able to unpack it cleanly.

Maybe it's a new API that could live in cuDF, but really we're looking for some API that takes vector<device_buffer> and returns us a device_buffer along with some form of metadata to then be able to "unpack" it back into a vector<device_buffer>.

This is along similar lines as #3793 but is more generalized than cudf::table_view as this could be reused in a lot of non-cudf places.

github-actions · 2021-02-16T17:29:38Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions · 2021-02-16T17:30:01Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

harrism · 2021-02-16T23:42:33Z

Seems like this issue should be moved to libcudf.

kkraus14 · 2021-02-17T01:03:19Z

Seems like this issue should be moved to libcudf.

I think we had this in RMM as opposed to libcudf because we wanted a place more general purpose than libcudf. I.E. Dask/Distributed would possibly be interested in using this for packing/unpacking buffers in communication, but cuDF is way too bulky of a dependency for them.

jrhemstad · 2021-02-17T14:37:34Z

Seems like this issue should be moved to libcudf.

I think we had this in RMM as opposed to libcudf because we wanted a place more general purpose than libcudf. I.E. Dask/Distributed would possibly be interested in using this for packing/unpacking buffers in communication, but cuDF is way too bulky of a dependency for them.

Sure, but RMM isn't a catch-all for stuff we don't want to put into libcudf. It muddies the single responsibility principle to start putting random kernels into an allocator and container library (which currently has no kernels).

kkraus14 · 2021-02-17T15:39:46Z

Yup agreed. This felt like it was in the gray area of somewhat related to memory management so the issue was raised here, but happy to defer it to somewhere else, but cuDF is too large of a dependency unfortunately.

jakirkham · 2021-02-17T17:59:04Z

I think cupy.concatenate would also work here. Not sure if that is too large of a dependency for this use case (or if we are looking for a C++ operation too)

github-actions · 2021-03-19T18:26:29Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2021-11-18T18:01:29Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

jrhemstad · 2021-11-18T23:11:02Z

NVIDIA/cub#359 would be the right way to do this now.

jakirkham · 2023-01-13T09:21:31Z

FYI PR ( NVIDIA/cub#359 ) landed. Looks like this will be part of CUB 2.1.0

harrism transferred this issue from rapidsai/rmm Nov 18, 2021

jakirkham mentioned this issue May 13, 2022

Add join method to DeviceBuffer rapidsai/rmm#1035

Closed

GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. labels Nov 21, 2022

GregoryKimball added this to the Helps libcudf C++ integrations milestone Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] `pack`/`unpack` functions to merge/split (multiple) `device_buffer`(s) #9726

[FEA] `pack`/`unpack` functions to merge/split (multiple) `device_buffer`(s) #9726

jakirkham commented Mar 1, 2020

jakirkham commented Mar 1, 2020

jrhemstad commented Mar 2, 2020 •

edited

Loading

kkraus14 commented Mar 2, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

harrism commented Feb 16, 2021

kkraus14 commented Feb 17, 2021

jrhemstad commented Feb 17, 2021

kkraus14 commented Feb 17, 2021

jakirkham commented Feb 17, 2021

github-actions bot commented Mar 19, 2021

github-actions bot commented Nov 18, 2021

jrhemstad commented Nov 18, 2021

jakirkham commented Jan 13, 2023

[FEA] pack/unpack functions to merge/split (multiple) device_buffer(s) #9726

[FEA] pack/unpack functions to merge/split (multiple) device_buffer(s) #9726

Comments

jakirkham commented Mar 1, 2020

jakirkham commented Mar 1, 2020

jrhemstad commented Mar 2, 2020 • edited Loading

kkraus14 commented Mar 2, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

harrism commented Feb 16, 2021

kkraus14 commented Feb 17, 2021

jrhemstad commented Feb 17, 2021

kkraus14 commented Feb 17, 2021

jakirkham commented Feb 17, 2021

github-actions bot commented Mar 19, 2021

github-actions bot commented Nov 18, 2021

jrhemstad commented Nov 18, 2021

jakirkham commented Jan 13, 2023

[FEA] `pack`/`unpack` functions to merge/split (multiple) `device_buffer`(s) #9726

[FEA] `pack`/`unpack` functions to merge/split (multiple) `device_buffer`(s) #9726

jrhemstad commented Mar 2, 2020 •

edited

Loading