Add proof of concept #2

mborland · 2024-07-18T15:09:38Z

Closes: #1
Based off @jzmaddock original PR here: boostorg/math#127

The rules seem to be more relaxed than in the original PR. assert and static_assert are allowed. Static variables are still not allowed, but I think that can be worked around using C++17's inline variables for global variables. The runner is using NVIDIA Tesla T4 (arch 70), and the CI provides version 12.5.0 of the NVIDIA toolkit which is the latest available. Locally I am using RHEL 9.4 with an RTX 3060 (arch 85) with the same version of the NVIDIA toolkit.

CC: @NAThompson, @ckormanyos

jzmaddock · 2024-07-19T10:26:54Z

test/test_arcsine.cpp

@@ -272,7 +271,7 @@ void test_spots(RealType)
    BOOST_CHECK_EQUAL(kurtosis_excess(arcsine_01), -1.5); // 3/2
    BOOST_CHECK_EQUAL(support(arcsine_01).first, 0); //
    BOOST_CHECK_EQUAL(range(arcsine_01).first, 0); //
-    BOOST_MATH_CHECK_THROW(mode(arcsine_01), std::domain_error); //  Two modes at x_min and x_max, so throw instead.
+    BOOST_CHECK_THROW(mode(arcsine_01), std::domain_error); //  Two modes at x_min and x_max, so throw instead.


Just checking... do we need these changes to the tests, or is it just a side effect of merging in an old branch?

I was having compilation issues with BOOST_MATH_CHECK_THROW but the version from Boost.Test worked just fine. Since the tests are already using Boost.test for everything else I figured it wasn't a big change.

jzmaddock · 2024-07-19T10:43:49Z

Thanks for this Matt!

Some random thoughts that have occurred to me:

It's probably possible to build with b2 if we wanted to (though for a handful of CUDA tests it makes little difference), if we assume the compiler is command line compatible with say gcc, then a user-config.jam that looks like:

using gcc : some-cuda-name :  "some-path/nvcc"  : <cxxflags>-fspecial-cuda-option ;

would do the trick, same for msvc.

It is possible to verify things build (but not run) on a regular github runner, see for example: https://github.com/boostorg/config/blob/5734e160e08b8df898c7f747000f27a3aafb7b2b/.github/workflows/ci.yml#L565
I notice that OpenCL/Sycl now has a fancy new clang based compiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html, the advantage of Sycl is that the code can be tested on a regular Intel Core processor, or via an emulator on a regular Github job (I believe this is what Boost.Compute are doing). I haven't gotten around to downloading Intel's Sycl implementation to try it yet though...
Thinking out loud here, on reflection, assert's aren't really that useful in a device context - I would presume folks would want everything optimised anyway, plus what does an assert on the GPU actually do?

mborland · 2024-07-19T12:18:06Z

Thanks for this Matt!

Some random thoughts that have occurred to me:

It's probably possible to build with b2 if we wanted to (though for a handful of CUDA tests it makes little difference), if we assume the compiler is command line compatible with say gcc, then a user-config.jam that looks like:
using gcc : some-cuda-name :  "some-path/nvcc"  : <cxxflags>-fspecial-cuda-option ;
would do the trick, same for msvc.

Could be worth a try. Peter also pointed me to: https://github.com/tee3/boost-build-nvcc.

It is possible to verify things build (but not run) on a regular github runner, see for example: https://github.com/boostorg/config/blob/5734e160e08b8df898c7f747000f27a3aafb7b2b/.github/workflows/ci.yml#L565

I didn't realize you could do that without the hardware installed.

I notice that OpenCL/Sycl now has a fancy new clang based compiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html, the advantage of Sycl is that the code can be tested on a regular Intel Core processor, or via an emulator on a regular Github job (I believe this is what Boost.Compute are doing). I haven't gotten around to downloading Intel's Sycl implementation to try it yet though...

I have this installed on my local machine as well. ChatGPT says I should just be able to replace __host__ __device__ with SYCL_EXTERNAL when SYCL_LANGUAGE_VERSION is defined. I'll have to pull out the intel docs at some point to validate if that's true or not. Boost.Compute was OpenCL, but since their last commit is over 5 years ago it only supports obsolete versions which is unfortunate.

Thinking out loud here, on reflection, assert's aren't really that useful in a device context - I would presume folks would want everything optimised anyway, plus what does an assert on the GPU actually do?

The CUDA threads are similar to CPU fibers so only the thread gets terminated not the entire process on an assertion. Since all of the assertions are using our own macro it would be easy to replace them with nothing if __CUDACC__ is defined.

mborland · 2024-07-19T15:05:46Z

I have the one test running fine locally with CUDA:

1: Test command: /home/mborland/Documents/boost/build/stage/bin/boost_cuda_math-test_arcsine_cdf_double
1: Working Directory: /home/mborland/Documents/boost/build/libs/cuda-math/test
1: Test timeout computed to be: 1500
1: [Vector operation on 50000 elements]
1: CUDA kernel launch with 98 blocks of 512 threads
1: CUDA kernal done in 0.020311s
1: Test PASSED with calculation time: 0.000759265s
1: Done
1: Failed to deinitialize the device! error=driver shutting down
1/1 Test #1: run-boost_cuda_math-test_arcsine_cdf_double ...   Passed    0.32 sec

I'll see if changing some macros around enables support for SYCL.

mborland · 2024-07-22T20:32:57Z

The beta function is now green with both SYCL and CUDA

mborland · 2024-07-24T13:22:29Z

Merging this since it's generally synchronized with the linked PR in math

mborland added 19 commits July 17, 2024 17:11

Update name of library in CML

f672418

Update testing CML to use boost_test_jamfile

953bb3c

Create new, trivial jamfile

b8cc19a

Replace BOOST_MATH_CHECK_THROW with boost.test

ed317e0

Add boost.test to dependencies

adb9f79

Add working variety of tests

cf245fc

Add additional configuration macros

889a2a4

Add additional macro for local static variables

803133a

Add CUDA markers to binomial dist

f374bde

Add CUDA markers to polynomial class

2674307

REVERT: Delete old CI runners

fff04de

Add CUDA cmake workflow

c1c7e97

Add cuda markers to arcsine distribution

d10faa8

Add arcsine dist testing

b51f61b

Enable use of language and test CUDA compiler

372dd85

Add John's existing arcsine cdf test

5936984

Add CUDA definition of force inline ignoring that of the host compiler

02b2602

Add config option to return NANs

c0b2921

First cut at CUDA allowed policies

5d30480

jzmaddock reviewed Jul 19, 2024

View reviewed changes

mborland added 5 commits July 19, 2024 09:21

Add new definitions of classification functions

6032100

Replace use of std::string with char*

2c36d8e

Mark constants as CUDA

276a12b

Disable exceptions for device code

cc9788b

Specialize epsilon functions

64702ab

mborland added 2 commits July 19, 2024 11:13

Small change to cmake invocation command

b861a19

Reconfigure jamfiles for GPU programming

b452589

mborland added 19 commits July 22, 2024 10:25

Add arcsine quantile float test

a48756f

Add simplified testing of beta function

6f36d81

Add beta function cuda testing

f323cc8

Update polynomial and rationals

c957b1d

Add additional overflow error overloads

fcea5cc

Add no promote policy

3cbf2ab

Add backup testing

79769e6

Add GPU safe max, min, and swap function macros

7779b4e

Add log1p GPU support

33605b8

Add precision helper functions

71cfc8e

Markup big_constant

3753def

Add support for lanczos and change storage duration

2b372d6

Apply GPU to rationals and series evaluations

95b54a6

Allow beta to be used on GPU

59d12ee

Add float beta testing

4924b18

Disable SSE2 lanczos on GPU

cf58316

Define policies for trivial testing

42e2f3d

Changes to allow beta on SYCL

304a87b

Remove test that we know wont pass from SYCL jamfile

ee000a1

mborland mentioned this pull request Jul 23, 2024

Initial CUDA and SYCL support boostorg/math#1161

Merged

mborland force-pushed the CI branch from 627ad1f to ee000a1 Compare July 23, 2024 13:00

mborland added 5 commits July 23, 2024 11:47

Reactivate domain error for beta

ef892fc

Ignore float too small errors

d29e932

Remove SYCL macros from errno calls

03610be

Adjust default policy for GPU cases

22ae589

Add boost math alias to CML

3fdc294

mborland merged commit d77a111 into master Jul 24, 2024
2 checks passed

mborland deleted the CI branch July 24, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proof of concept #2

Add proof of concept #2

mborland commented Jul 18, 2024

jzmaddock Jul 19, 2024

mborland Jul 19, 2024

jzmaddock commented Jul 19, 2024

mborland commented Jul 19, 2024

mborland commented Jul 19, 2024

mborland commented Jul 22, 2024

mborland commented Jul 24, 2024

Add proof of concept #2

Add proof of concept #2

Conversation

mborland commented Jul 18, 2024

jzmaddock Jul 19, 2024

Choose a reason for hiding this comment

mborland Jul 19, 2024

Choose a reason for hiding this comment

jzmaddock commented Jul 19, 2024

mborland commented Jul 19, 2024

mborland commented Jul 19, 2024

mborland commented Jul 22, 2024

mborland commented Jul 24, 2024