You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an issue for tracking initializing data on distributed environment. On both Spark side and Dask side we need to concatenate partitions into a single block of data to initialize the DMatrix. Even with the new DeviceQuantileDMatrix the peak memory usage is still twice the size of actual data.
One solution is to implement some sorts of call back functions for internal data adapter to process these partitions inplace to avoid concatenating data.
The text was updated successfully, but these errors were encountered:
This is an issue for tracking initializing data on distributed environment. On both Spark side and Dask side we need to concatenate partitions into a single block of data to initialize the DMatrix. Even with the new
DeviceQuantileDMatrix
the peak memory usage is still twice the size of actual data.One solution is to implement some sorts of call back functions for internal data adapter to process these partitions inplace to avoid concatenating data.
The text was updated successfully, but these errors were encountered: