Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half matrix and components #1708

Open
wants to merge 12 commits into
base: half_type
Choose a base branch
from
Open

Half matrix and components #1708

wants to merge 12 commits into from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Oct 24, 2024

This PR adds the matrix and components (like arrary/device_matrix_data) with half precision support

Also, to avoid touch the files, which are not related to this PR, I add several type list with half additionally.
For example, RealValueTypes -> RealValueTypesWithHalf and next_precision -> next_precision_with_half
(We will add bfloat16 in the future, so maybe do not use Half)

for the friend and corresponding function

friend class <prev<value_type>> // class previous precision can access this
function(class <next<value_type>>) // it can access the class with next precision because it is class with previous precision of class with next precision.

If we only use next in friend and function

friend class <next<next<value_type>>>
function(class <next<value_type>>)

Moreover, the second one does not work when we fallback the next_precision_with_half to next_precision by disabling half because next<next<value_type>> is value_type without half now. However, the first one always work.

TODO:

@yhmtsai yhmtsai requested review from a team October 24, 2024 15:30
@yhmtsai yhmtsai self-assigned this Oct 24, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats type:factorization This is related to the Factorizations type:reordering This is related to the matrix(LinOp) reordering mod:all This touches all Ginkgo modules. labels Oct 24, 2024
@yhmtsai yhmtsai mentioned this pull request Oct 24, 2024
12 tasks
@yhmtsai yhmtsai force-pushed the half_matrix branch 2 times, most recently from 8f3a17d to b7d4a15 Compare October 25, 2024 08:20
@yhmtsai yhmtsai added this to the Ginkgo 1.9.0 milestone Oct 25, 2024
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Oct 25, 2024
@MarcelKoch MarcelKoch self-requested a review October 25, 2024 15:54
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. There are still some places where the new half-enabled types are missing.

include/ginkgo/core/base/segmented_array.hpp Show resolved Hide resolved
include/ginkgo/core/base/precision_dispatch.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/row_gatherer.hpp Outdated Show resolved Hide resolved
core/test/utils.hpp Show resolved Hide resolved
core/test/utils.hpp Show resolved Hide resolved
hip/components/cooperative_groups.hip.hpp Outdated Show resolved Hide resolved
static_assert(sizeof(ValueType) == sizeof(ResultType),
"The type to reinterpret to must be of the same size as the "
"original type.");
return reinterpret_cast<ResultType&>(val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use memcpy here directly.

}
#else
// UB?
uint16_t* address_as_converter = reinterpret_cast<uint16_t*>(&out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can't be done without either UB, or falling back to omp critical.

reference/matrix/ell_kernels.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1 / 2 of my review. So far, I only have small comments.

option(GINKGO_ENABLE_HALF "Enable the use of half precision" ON)
# We do not support MSVC. SYCL will come later
if(MSVC OR GINKGO_BUILD_SYCL)
message(STATUS "HALF is not supported in MSVC, and later support in SYCL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rephrased since I really don't know what you mean by "and later support in SYCL".
Do you mean that SYCL does support half-precision in a later version?

Copy link
Member Author

@yhmtsai yhmtsai Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will enable the support from #1710
As the half is trivial copy again now, we might not need the device_type mapping though.

struct device_numeric_limits<__half> {
// from __half documentation, it accepts unsigned short
// __half does not have constexpr
static GKO_ATTRIBUTES GKO_INLINE auto inf()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also work for host code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the constructor are available on both side.
side note, the operations are not.

Comment on lines +36 to +38
static constexpr auto inf() { return std::numeric_limits<T>::infinity(); }
static constexpr auto max() { return std::numeric_limits<T>::max(); }
static constexpr auto min() { return std::numeric_limits<T>::min(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you made these into functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__half does not have constexpr constructor such that I can not put them into static constexpr data here.
I do not try the following such that keep them as data yet.
If we still make it as the data not functions, the definition is on the host.
Can cuda still use them in device code? I doubt not because we do not pass it through the kernel call.

Thus, I made them into functions.

}


// Dircetly call float versrion from here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos:

Suggested change
// Dircetly call float versrion from here?
// Directly call float version from here?

Also, why is abs specialized, while sqrt above is a separate function? Because that's how thrust does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sqrt can also be specialized

Comment on lines 119 to 120
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host
// compiler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the meaning, maybe:

Suggested change
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host
// compiler.
// It is required by NVHPC 23.3, `isnan` is undefined when NVHPC is only used as a host
// compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly,
I think cuda will go through the code twice. one for device and the other for the rest.
NVCC does not complain anything, but NVHPC will complain isnan is not defined.
TBH, I forgot whether I put __device__ or not when I encounter this issue.
I will check again

THRUST_HALF_FRIEND_OPERATOR(+, +=)
THRUST_HALF_FRIEND_OPERATOR(-, -=)
THRUST_HALF_FRIEND_OPERATOR(*, *=)
THRUST_HALF_FRIEND_OPERATOR(/, /=)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this macro afterward? If not, maybe just undef the macro.

exec,
[] GKO_KERNEL(auto idx, auto array) {
if constexpr (std::is_same_v<remove_complex<ValueType>, half>) {
// __half can not be from int64_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that?
That half can't be converted to int64_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, __half can not be converted from int64_t.
cuda only writes the conversion from short, int, long long and the corresponding unsigned version.
Unfortuntately, it does not accepts int64_t even if long long and int64_t are the same technically.

@ginkgo-bot
Copy link
Member

Error: The following files need to be formatted:

common/cuda_hip/components/atomic.hpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:reordering This is related to the matrix(LinOp) reordering type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants