-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half factorization #1712
base: half_solver
Are you sure you want to change the base?
Half factorization #1712
Conversation
3db59fd
to
cd9677a
Compare
cd9677a
to
5e5cd03
Compare
5e5cd03
to
c276034
Compare
c276034
to
bbefde6
Compare
bbefde6
to
72d9d50
Compare
72d9d50
to
88967e6
Compare
88967e6
to
e667ec0
Compare
50ae4c1
to
bba40e0
Compare
e667ec0
to
c32201d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I have a question regarding atomics and hip. The latest ROCm shows support for fp16 atomic operations: https://rocm.docs.amd.com/en/latest/reference/precision-support.html#atomic-operations-support, but TBH I can't figure out what operations exactly they mean with that. Did you try anything in that regard?
PairTypenameNameGenerator); | ||
|
||
|
||
TYPED_TEST(ParIlut, KernelThresholdSelectIsEquivalentToRef) | ||
{ | ||
using value_type = typename TestFixture::value_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the tests here are missing SKIP_HALF
if compiling for HIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not support compute_l_u_factors in hip, but the others still works with half precision in HIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your meaning now
cuda/solver/common_trs_kernels.cuh
Outdated
@@ -212,13 +212,15 @@ struct CudaSolveStruct : gko::solver::SolveStruct { | |||
|
|||
size_type work_size{}; | |||
|
|||
// TODO: In nullptr is considered nullptr_t not casted to const | |||
// it does not work in cuda110/100 images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// it does not work in cuda110/100 images | |
// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more on the host compiler side because it goes through our binding first with specfic type
c32201d
to
257585d
Compare
cuda/solver/common_trs_kernels.cuh
Outdated
using shared_value_type = std::conditional_t< | ||
std::is_same<remove_complex<ValueType>, gko::half>::value, float, | ||
ValueType>; | ||
sptrsv_naive_caching_kernel<is_upper, device_type<shared_value_type>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now this will also be float
when using std::complex<gko::half>
. That doesn't seem correct. You will loose any imaginary part that might be in the matrix or vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right.
Sorry, I do not pay enough attention when changing it.
257585d
to
eb14467
Compare
7568854
to
d68a589
Compare
d68a589
to
e1a3b3d
Compare
bea709e
to
e4973cb
Compare
… in shared memory
e4973cb
to
f6291e6
Compare
this pr adds the factorization with half support.
Hip does not support atomic on the 16bits type currently
TODO: