Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible call back function for DMatrix. #5571

Closed
trivialfis opened this issue Apr 21, 2020 · 0 comments
Closed

Extensible call back function for DMatrix. #5571

trivialfis opened this issue Apr 21, 2020 · 0 comments

Comments

@trivialfis
Copy link
Member

This is an issue for tracking initializing data on distributed environment. On both Spark side and Dask side we need to concatenate partitions into a single block of data to initialize the DMatrix. Even with the new DeviceQuantileDMatrix the peak memory usage is still twice the size of actual data.

One solution is to implement some sorts of call back functions for internal data adapter to process these partitions inplace to avoid concatenating data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant