Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] pack/unpack functions to merge/split (multiple) device_buffer(s) #9726

Open
jakirkham opened this issue Mar 1, 2020 · 14 comments
Open
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@jakirkham
Copy link
Member

Is your feature request related to a problem? Please describe.

It would be useful to have a pack function to merge multiple device_buffers into a single device_buffer. This is helpful in situations where having one large device_buffer to read from is more performant. However it ultimately consists of many smaller data segments that would need to be merged together. Example use cases include sending data with UCX and spilling data from device to host.

Similarly it would be useful to have an unpack function to split a device_buffer into multiple device_buffers. This is helpful in situations where having one large device_buffer to write into is more performant. However it ultimately consists of many smaller data segments that may need to be freed at different times. Example use cases include receiving data with UCX and unspilling data from host to device.

Describe the solution you'd like

For pack it would be nice if it simply takes several device_buffers in vector form and return a single one. Additionally it would be nice if pack could recognize when device_buffers are contiguous in memory and avoid a copy. Though admittedly this last part is tricky (maybe less so if unpack is used regularly?). If we allow pack to change the order (to benefit from contiguous memory for example), we may want additional information about where the data segments live in the larger device_buffer.

For unpack it would be nice if it takes a single device_buffer and size_ts in vector form to split and return a vector of multiple device_buffers. Additionally it would be nice if unpack did not perform any copies. Hopefully that is straightforward, but there may be things I'm not understanding.

Describe alternatives you've considered

One might consider using variadics in C++ for the arguments. While nice at the C++ level, this seems tricky to use from the Cython and Python levels. Hence the suggestion to just use vector.

pack itself could be implemented by a user simply allocating a larger buffer and copying over. Would be nice to avoid the extra allocation when possible though (which may require knowledge that RMM has about the allocations).

Additional context

Having unpack in particular would be helpful for aggregated receives. A natural extension of this would be to have pack for aggregated sends. All-in-all this should allow transmitting a larger amount of data at once with UCX and thus benefiting from this use case it is more honed for. PR ( dask/distributed#3453 ) provides a WIP implementation of aggregated receives for context.

Also having pack would be useful when spilling several device_buffers from device to host as it would allow us to pack them into one device_buffer before transferring ( rapidsai/dask-cuda#250 ). Having unpack would help us break up the allocation whenever the object is unspilled.

This need has also come up in downstream contexts ( #3793 ). Maybe they would benefit from an upstream solution as well?

@jakirkham
Copy link
Member Author

As to pack, maybe these CUDA Virtual Memory Management APIs (pointed out by Cory in a related context) would be useful?

@jrhemstad
Copy link
Contributor

jrhemstad commented Mar 2, 2020

This functionality is kind of outside the scope of RMM. The direction we'd like to go with RMM is really to just be as simple as possible by providing a set of resources, the tools to get/set the default resource, and containers like device_buffer/device_vector that work with resources.

This kind of pack or concatenate functionality is really more the wheelhouse of a consumer of RMM like cuDF.

In cuDF, this is just a concatenate. #4224

@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 2, 2020

In cuDF, this is just a concatenate. #4224

That imposes a lot of challenges in order to construct your data in a way to allow using concatenate and then being able to unpack it cleanly.

Maybe it's a new API that could live in cuDF, but really we're looking for some API that takes vector<device_buffer> and returns us a device_buffer along with some form of metadata to then be able to "unpack" it back into a vector<device_buffer>.

This is along similar lines as #3793 but is more generalized than cudf::table_view as this could be reused in a lot of non-cudf places.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@harrism
Copy link
Member

harrism commented Feb 16, 2021

Seems like this issue should be moved to libcudf.

@kkraus14
Copy link
Collaborator

Seems like this issue should be moved to libcudf.

I think we had this in RMM as opposed to libcudf because we wanted a place more general purpose than libcudf. I.E. Dask/Distributed would possibly be interested in using this for packing/unpacking buffers in communication, but cuDF is way too bulky of a dependency for them.

@jrhemstad
Copy link
Contributor

Seems like this issue should be moved to libcudf.

I think we had this in RMM as opposed to libcudf because we wanted a place more general purpose than libcudf. I.E. Dask/Distributed would possibly be interested in using this for packing/unpacking buffers in communication, but cuDF is way too bulky of a dependency for them.

Sure, but RMM isn't a catch-all for stuff we don't want to put into libcudf. It muddies the single responsibility principle to start putting random kernels into an allocator and container library (which currently has no kernels).

@kkraus14
Copy link
Collaborator

Yup agreed. This felt like it was in the gray area of somewhat related to memory management so the issue was raised here, but happy to defer it to somewhere else, but cuDF is too large of a dependency unfortunately.

@jakirkham
Copy link
Member Author

I think cupy.concatenate would also work here. Not sure if that is too large of a dependency for this use case (or if we are looking for a C++ operation too)

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@harrism harrism transferred this issue from rapidsai/rmm Nov 18, 2021
@jrhemstad
Copy link
Contributor

NVIDIA/cub#359 would be the right way to do this now.

@GregoryKimball GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. labels Nov 21, 2022
@jakirkham
Copy link
Member Author

FYI PR ( NVIDIA/cub#359 ) landed. Looks like this will be part of CUB 2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

5 participants