GPU support on ppc64le #2937

everton1984 · 2020-03-23T21:39:19Z

Currently there is no way to execute LightGBM on ppc64le with GPU since it only has OpenCL support. What would be the correct approach to add CUDA code along with OpenCL on LightGBM? If I understand correctly, adding CUDA kernels to treelearner would be enough right?
There are other people interested on this, #1905 and #1129 for example.

ceseo · 2020-03-23T21:41:42Z

cc @ceseo

ceseo · 2020-03-30T22:22:52Z

Our plan is to contribute the CUDA kernels (histogram16/64/256) to treelearner and create a common API for OpenCL and CUDA (a common template class), so it minimizes the amount of changes needed in the current code base.

At first, CUDA would be conditioned to ppc64le/linux only.

Does that look reasonable?

guolinke · 2020-04-01T02:41:14Z

Thanks @ceseo and @everton1984 , the contribution is very welcome!
ping @huanzhang12 for the detail discussion.

everton1984 · 2020-04-01T12:41:31Z

@huanzhang12 As @ceseo mentioned, we want to add a new set of histograms written in cuda, use the current gputreelearner as a base class and add specialisations with non-common code for OpenCL and CUDA. I think this would add minimal changes to code outside gputreelearner. Would that be acceptable? Are you aware of any GPU code outside that part that we might have missed?

huanzhang12 · 2020-04-01T22:39:05Z

@everton1984 It will be a great contribution if you can add CUDA equivalent kernels to LightGBM! Yes you can inherit the current gputreelearner as the base class, but currently many functions in GPU tree learner involve memory management in OpenCL (e.g., enqueue_write_buffer, enqueue_unmap_buffer, etc), so most of the methods cannot be directly reused. I suggest making an abstraction layer for memory management (e.g., use a wrapper to handle memory related function calls, and forward the call to OpenCL/CUDA backend accordingly), so that many parts of existing gputreelearner can be reused.

everton1984 · 2020-04-01T23:32:51Z

@huanzhang12 My idea was to implement something like

#include <memory>
#include <iostream>

class GPUTreeLearner
{
public:
    GPUTreeLearner() {

    };

    /* common GPU code */
    virtual void doStuff(){
    };

    void A() {
        std::cout << "GPU common code\n";
    };
};

class OCLTreeLearner : public GPUTreeLearner
{
public:
    OCLTreeLearner() {
    };

    /* OpenCL specific code */
    void doStuff() {
        std::cout << "OCL only code\n";
    };
};

class CUDATreeLearner : public GPUTreeLearner
{
public:
    CUDATreeLearner() {
    };

    /* CUDA specific code */
    void doStuff() {
        std::cout << "CUDA only code\n";
    };
};

class GPUTreeLearnerFactory
{
public:
    static std::unique_ptr<GPUTreeLearner> GetGPU(bool is_cuda)
    {
        if( is_cuda ){
           return std::make_unique<CUDATreeLearner>();
        } else {
           return std::make_unique<OCLTreeLearner>();
        }
    };
};

int main() {
    /* random code */
    std::unique_ptr<GPUTreeLearner> pcu = GPUTreeLearnerFactory::GetGPU(false);
    std::unique_ptr<GPUTreeLearner> pcl = GPUTreeLearnerFactory::GetGPU(true);
    pcu->A();
    pcu->doStuff();
    pcl->A();
    pcl->doStuff();

    return 0;
}

so GPUTreeLearner would keep the common code. I think this approach is easily maintainable and change very little code outside GPUTreeLearner.

ChipKerchner · 2020-04-02T12:10:54Z

One current limitation to our CUDA code is that it can NOT coexist (AKA compiled at the same time) with GPU / OCL code (similar to the constraint that CPU and GPU / OCL can NOT coexist). Do you see a need for both simultaneously in the same compilation? If so, hopefully we can overcome this.

We also assume that having parity with GPU / OCL and CUDA is a given? That to us means single precision 16, 64 & 256 histogram support. Our initial CUDA version will "probably" also have double precision histograms. So it would be nice for someone with GPU / OCL expertise to add that functionality into the project.

huanzhang12 · 2020-04-04T06:00:39Z

@everton1984 The implementation skeleton looks great! I suggested an abstraction layer on memory management as it may help reduce code duplication. But you can determine what is the best approach to do this.

@ChipKerchner The OpenCL implementation does support double precision. It can be enabled using the flag gpu_use_dp=true. Double precision can be very slow on NVIDIA GPUs though unless using the very expensive "datacenter" GPUs (P100, V100, etc).

ceseo · 2020-04-06T20:32:36Z

@huanzhang12 the original motivation of this port was exactly running on Tesla V100s on POWER9 systems. Do you have any objections starting the port with double precision support only at first? Most ppc64le systems support only server-grade GPUs anyway.

bordaw · 2020-04-08T21:12:26Z

@huanzhang12 we have also encountered the need for histograms with bin size >256. We have plans for supporting Histograms for larger bins. I think OpenCL implementation would benefit from this capability as well.

everton1984 · 2020-04-08T22:04:45Z

@huanzhang12 we have also encountered the need for histograms with bin size >256. We have plans for supporting Histograms for larger bins. I think OpenCL implementation would benefit from this capability as well.

@bordaw This is not relevant to the current issue, we're discussing enablement of GPU on ppc64le. Create another Issue to discuss new features please.

huanzhang12 · 2020-04-08T23:45:32Z

@ceseo @everton1984 Don't worry about single/double precision too much at this time - as long as we have the double precision version working, it is not difficult to add single precision support. If you look at the current GPU code there are only a few places involving different code path for handling single/double precision, and the differences between the two in handling are not that big. But it is better to keep single precision in mind when designing the code work flow to ease further extensions.

@bordaw It is possible to support large bins but performance can be much worse, because if bin size is less than 256 it is possible to fit them in the local memory in GPU with much higher bandwidth. If you would like to work on this you are very welcome to open a new issue.

bordaw · 2020-04-08T23:55:52Z

@huanzhang12 Yes, I will create another issue. Also, I agree that the code changes required for single/double precision histogram are not significant.

ChipKerchner · 2020-06-10T18:20:08Z

We are almost ready to push the initial candidate for the CUDA branch.

ChipKerchner · 2020-07-01T14:29:35Z

The pull request has been active for almost 3 weeks and we're not getting much feedback.

I'd like some assistance with the continuous integration systems. It shows problems but my testing on an AMD64 system with a GPU (OCL) shows no issues with the unit tests. I am unable to produce an environment that allows me to run the automated testing. Please help!
@huanzhang12 @guolinke @StrikerRUS

guolinke · 2020-07-02T00:55:37Z

sorry @ChipKerchner , I will look at this today.

ChipKerchner · 2020-08-26T15:32:38Z

@huanzhang12 Our resources are starting to be moved off this project. We would like to get this feature approved and integrated into master ASAP if possible. Please let us know any outstanding issues.

StrikerRUS · 2020-09-21T14:28:02Z

Implemented in #3160. Many thanks to all involved people for the amazing work! We are looking forward for further CUDA improvements, e.g. Windows support.

github-actions · 2023-08-23T20:48:46Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

StrikerRUS assigned huanzhang12 Mar 30, 2020

StrikerRUS mentioned this issue Apr 1, 2020

Suggestion: Also create a lightgbm-gpu release #2263

Closed

StrikerRUS mentioned this issue May 11, 2020

v3.0.0rc1 #3071

Merged

ChipKerchner mentioned this issue Jun 11, 2020

Add support for CUDA-based GPU build #3160

Merged

guolinke mentioned this issue Aug 10, 2020

[WIP] next release (3.0.0) #3293

Closed

10 tasks

StrikerRUS closed this as completed Sep 21, 2020

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support on ppc64le #2937

GPU support on ppc64le #2937

everton1984 commented Mar 23, 2020 •

edited

Loading

ceseo commented Mar 23, 2020

ceseo commented Mar 30, 2020

guolinke commented Apr 1, 2020

everton1984 commented Apr 1, 2020

huanzhang12 commented Apr 1, 2020

everton1984 commented Apr 1, 2020

ChipKerchner commented Apr 2, 2020 •

edited

Loading

huanzhang12 commented Apr 4, 2020

ceseo commented Apr 6, 2020

bordaw commented Apr 8, 2020

everton1984 commented Apr 8, 2020

huanzhang12 commented Apr 8, 2020 •

edited

Loading

bordaw commented Apr 8, 2020

ChipKerchner commented Jun 10, 2020

ChipKerchner commented Jul 1, 2020 •

edited

Loading

guolinke commented Jul 2, 2020

ChipKerchner commented Aug 26, 2020 •

edited

Loading

StrikerRUS commented Sep 21, 2020

github-actions bot commented Aug 23, 2023

GPU support on ppc64le #2937

GPU support on ppc64le #2937

Comments

everton1984 commented Mar 23, 2020 • edited Loading

ceseo commented Mar 23, 2020

ceseo commented Mar 30, 2020

guolinke commented Apr 1, 2020

everton1984 commented Apr 1, 2020

huanzhang12 commented Apr 1, 2020

everton1984 commented Apr 1, 2020

ChipKerchner commented Apr 2, 2020 • edited Loading

huanzhang12 commented Apr 4, 2020

ceseo commented Apr 6, 2020

bordaw commented Apr 8, 2020

everton1984 commented Apr 8, 2020

huanzhang12 commented Apr 8, 2020 • edited Loading

bordaw commented Apr 8, 2020

ChipKerchner commented Jun 10, 2020

ChipKerchner commented Jul 1, 2020 • edited Loading

guolinke commented Jul 2, 2020

ChipKerchner commented Aug 26, 2020 • edited Loading

StrikerRUS commented Sep 21, 2020

github-actions bot commented Aug 23, 2023

everton1984 commented Mar 23, 2020 •

edited

Loading

ChipKerchner commented Apr 2, 2020 •

edited

Loading

huanzhang12 commented Apr 8, 2020 •

edited

Loading

ChipKerchner commented Jul 1, 2020 •

edited

Loading

ChipKerchner commented Aug 26, 2020 •

edited

Loading