Add option to disable ELLPACK format/index arrays for GPU training #10981

terraflops1048576 · 2024-11-04T22:06:34Z

Currently, XGBoost has a GHistIndexMatrix which stores data in CSR format; however, if the data is completely dense (no missing data such as NaNs), it optimizes this away by not reserving the extra space for the index arrays. This is not true of the EllpackPageImpl, which allocates the space and creates the indices, regardless of whether is_dense is set or not.

We currently use XGBoost to build trees on large amounts of almost completely dense data, which can be quantized to 1 byte per float. This means that the sparsity optimizations actually balloon the size of the memory required, with the GPU memory being used being almost exactly 3 bytes per float, as one would expect when storing dense data in Ellpack. One can observe with the just GHistIndexMatrix that the memory consumption increases by 3-4x when a single NaN is added to the data, making the is_dense false.

The feature request is to simply allow the user to turn off CSR/Ellpack format and store densely with a manual config option in QuantileDMatrix.

I've tried reading the code, and I don't understand it fully, so I can't be of much help with coming up with a roadmap.

The text was updated successfully, but these errors were encountered:

trivialfis · 2024-11-05T04:33:29Z

For GPU ellpack, it's already done in the master branch. Dense data will use a smaller amount of memory.

terraflops1048576 · 2024-11-05T07:06:23Z

Oh, that's great! Just to confirm, #10870 is the PR that implements this functionality?

trivialfis · 2024-11-05T07:50:58Z

Yes. For relatively dense data (as defined in the PR description), XGBoost should be much faster and use lesser memory.

terraflops1048576 closed this as completed Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to disable ELLPACK format/index arrays for GPU training #10981

Add option to disable ELLPACK format/index arrays for GPU training #10981

terraflops1048576 commented Nov 4, 2024

trivialfis commented Nov 5, 2024 •

edited

Loading

terraflops1048576 commented Nov 5, 2024

trivialfis commented Nov 5, 2024 •

edited

Loading

Add option to disable ELLPACK format/index arrays for GPU training #10981

Add option to disable ELLPACK format/index arrays for GPU training #10981

Comments

terraflops1048576 commented Nov 4, 2024

trivialfis commented Nov 5, 2024 • edited Loading

terraflops1048576 commented Nov 5, 2024

trivialfis commented Nov 5, 2024 • edited Loading

trivialfis commented Nov 5, 2024 •

edited

Loading

trivialfis commented Nov 5, 2024 •

edited

Loading