Half factorization #1712

yhmtsai · 2024-10-25T17:17:50Z

this pr adds the factorization with half support.

Hip does not support atomic on the 16bits type currently

TODO:

add the fix of tri solve with half

MarcelKoch

Generally LGTM. I have a question regarding atomics and hip. The latest ROCm shows support for fp16 atomic operations: https://rocm.docs.amd.com/en/latest/reference/precision-support.html#atomic-operations-support, but TBH I can't figure out what operations exactly they mean with that. Did you try anything in that regard?

MarcelKoch · 2024-11-11T11:37:35Z

test/factorization/par_ilut_kernels.cpp

                 PairTypenameNameGenerator);


 TYPED_TEST(ParIlut, KernelThresholdSelectIsEquivalentToRef)
 {
+    using value_type = typename TestFixture::value_type;


Many of the tests here are missing SKIP_HALF if compiling for HIP.

we do not support compute_l_u_factors in hip, but the others still works with half precision in HIP

I got your meaning now

MarcelKoch · 2024-11-11T14:45:16Z

cuda/solver/common_trs_kernels.cuh

@@ -212,13 +212,15 @@ struct CudaSolveStruct : gko::solver::SolveStruct {

        size_type work_size{};

+        // TODO: In nullptr is considered nullptr_t not casted to const
+        // it does not work in cuda110/100 images


nit:

Suggested change

// it does not work in cuda110/100 images

// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11

I think it is more on the host compiler side because it goes through our binding first with specfic type

cuda/solver/common_trs_kernels.cuh

hip/components/memory.hip.hpp

reference/factorization/par_ilut_kernels.cpp

test/factorization/lu_kernels.cpp

MarcelKoch · 2024-11-12T10:01:28Z

cuda/solver/common_trs_kernels.cuh

+        using shared_value_type = std::conditional_t<
+            std::is_same<remove_complex<ValueType>, gko::half>::value, float,
+            ValueType>;
+        sptrsv_naive_caching_kernel<is_upper, device_type<shared_value_type>>


now this will also be float when using std::complex<gko::half>. That doesn't seem correct. You will loose any imaginary part that might be in the matrix or vectors.

you are right.
Sorry, I do not pay enough attention when changing it.

… in shared memory

yhmtsai added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Oct 25, 2024

yhmtsai self-assigned this Oct 25, 2024

yhmtsai mentioned this pull request Oct 25, 2024

Half preconditioner, multigrid, log, and reorder #1713

Open

yhmtsai force-pushed the half_factorization branch from 3db59fd to cd9677a Compare October 28, 2024 16:12

yhmtsai force-pushed the half_solver branch from e962cb2 to 9a15695 Compare October 28, 2024 16:12

yhmtsai force-pushed the half_factorization branch from cd9677a to 5e5cd03 Compare October 28, 2024 17:19

yhmtsai force-pushed the half_solver branch from 9a15695 to 1d7f1d1 Compare October 28, 2024 17:19

yhmtsai force-pushed the half_factorization branch from 5e5cd03 to c276034 Compare October 29, 2024 09:17

yhmtsai force-pushed the half_solver branch from 1d7f1d1 to 1038d78 Compare October 29, 2024 09:17

yhmtsai force-pushed the half_factorization branch from c276034 to bbefde6 Compare October 29, 2024 18:21

yhmtsai force-pushed the half_solver branch from 1038d78 to 1959026 Compare October 29, 2024 18:21

yhmtsai mentioned this pull request Oct 30, 2024

Half precision support #1257

Open

12 tasks

yhmtsai added this to the Ginkgo 1.9.0 milestone Oct 30, 2024

yhmtsai force-pushed the half_solver branch from 1959026 to ac679c2 Compare November 4, 2024 14:24

yhmtsai force-pushed the half_factorization branch from bbefde6 to 72d9d50 Compare November 4, 2024 14:24

yhmtsai force-pushed the half_solver branch from ac679c2 to eda6a77 Compare November 4, 2024 18:15

yhmtsai force-pushed the half_factorization branch from 72d9d50 to 88967e6 Compare November 4, 2024 18:15

yhmtsai added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Nov 5, 2024

yhmtsai force-pushed the half_factorization branch from 88967e6 to e667ec0 Compare November 5, 2024 18:03

yhmtsai force-pushed the half_solver branch 2 times, most recently from 50ae4c1 to bba40e0 Compare November 7, 2024 14:40

yhmtsai force-pushed the half_factorization branch from e667ec0 to c32201d Compare November 7, 2024 14:40

MarcelKoch self-requested a review November 11, 2024 11:25

MarcelKoch requested changes Nov 11, 2024

View reviewed changes

yhmtsai force-pushed the half_solver branch from bba40e0 to ffb5612 Compare November 12, 2024 09:43

yhmtsai force-pushed the half_factorization branch from c32201d to 257585d Compare November 12, 2024 09:43

MarcelKoch requested changes Nov 12, 2024

View reviewed changes

yhmtsai force-pushed the half_factorization branch from 257585d to eb14467 Compare November 12, 2024 16:02

yhmtsai force-pushed the half_solver branch from ffb5612 to 4d26712 Compare November 12, 2024 16:02

yhmtsai requested a review from MarcelKoch November 13, 2024 12:57

MarcelKoch approved these changes Nov 13, 2024

View reviewed changes

MarcelKoch requested a review from upsj November 13, 2024 14:16

yhmtsai force-pushed the half_factorization branch 2 times, most recently from 7568854 to d68a589 Compare November 14, 2024 10:08

yhmtsai force-pushed the half_solver branch from 4d26712 to d64417c Compare November 18, 2024 11:16

yhmtsai force-pushed the half_factorization branch from d68a589 to e1a3b3d Compare November 18, 2024 11:16

yhmtsai force-pushed the half_solver branch from d64417c to 18139fd Compare November 18, 2024 12:46

yhmtsai force-pushed the half_factorization branch 2 times, most recently from bea709e to e4973cb Compare November 18, 2024 13:43

yhmtsai force-pushed the half_solver branch from 18139fd to 88c19f5 Compare November 18, 2024 13:43

yhmtsai added 9 commits November 19, 2024 10:20

triangular and direct solver

648e3b3

workaround for half precision of load/store by using single precision…

39ed33d

… in shared memory

delete the current unusable half memory op on shared memory

0755571

direct and tri config dispatch

48a02eb

factorization

f69f459

factorization config dispatch

534cca4

cmake cuda test with cuda arch and fix is_finite

606ea8e

figure out factorization test

a47186e

change the diagonal to reduce random on parilut/parict

f6291e6

yhmtsai force-pushed the half_solver branch from 88c19f5 to 5993a90 Compare November 19, 2024 09:20

yhmtsai force-pushed the half_factorization branch from e4973cb to f6291e6 Compare November 19, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half factorization #1712

Half factorization #1712

yhmtsai commented Oct 25, 2024 •

edited

Loading

MarcelKoch left a comment

MarcelKoch Nov 11, 2024

yhmtsai Nov 12, 2024

yhmtsai Nov 14, 2024

MarcelKoch Nov 11, 2024

yhmtsai Nov 12, 2024

MarcelKoch Nov 12, 2024

yhmtsai Nov 12, 2024

	// it does not work in cuda110/100 images
	// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11

Half factorization #1712

Are you sure you want to change the base?

Half factorization #1712

Conversation

yhmtsai commented Oct 25, 2024 • edited Loading

MarcelKoch left a comment

Choose a reason for hiding this comment

MarcelKoch Nov 11, 2024

Choose a reason for hiding this comment

yhmtsai Nov 12, 2024

Choose a reason for hiding this comment

yhmtsai Nov 14, 2024

Choose a reason for hiding this comment

MarcelKoch Nov 11, 2024

Choose a reason for hiding this comment

yhmtsai Nov 12, 2024

Choose a reason for hiding this comment

MarcelKoch Nov 12, 2024

Choose a reason for hiding this comment

yhmtsai Nov 12, 2024

Choose a reason for hiding this comment

yhmtsai commented Oct 25, 2024 •

edited

Loading