Extensible call back function for DMatrix. #5571

trivialfis · 2020-04-21T03:07:09Z

This is an issue for tracking initializing data on distributed environment. On both Spark side and Dask side we need to concatenate partitions into a single block of data to initialize the DMatrix. Even with the new DeviceQuantileDMatrix the peak memory usage is still twice the size of actual data.

One solution is to implement some sorts of call back functions for internal data adapter to process these partitions inplace to avoid concatenating data.

The text was updated successfully, but these errors were encountered:

trivialfis added the type: roadmap label Apr 21, 2020

trivialfis mentioned this issue Apr 22, 2020

[FEA][Python] Multi-GPU training with DeviceQuantileDMatrix #5583

Closed

trivialfis mentioned this issue Apr 30, 2020

Move device dmatrix construction code into ellpack. #5623

Merged

trivialfis mentioned this issue May 7, 2020

Use DMatrix Proxy for implementing data callback. #5629

Closed

trivialfis closed this as completed Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extensible call back function for DMatrix. #5571

Extensible call back function for DMatrix. #5571

trivialfis commented Apr 21, 2020

Extensible call back function for DMatrix. #5571

Extensible call back function for DMatrix. #5571

Comments

trivialfis commented Apr 21, 2020