-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half matrix and components #1708
base: half_type
Are you sure you want to change the base?
Conversation
8f3a17d
to
b7d4a15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. There are still some places where the new half-enabled types are missing.
static_assert(sizeof(ValueType) == sizeof(ResultType), | ||
"The type to reinterpret to must be of the same size as the " | ||
"original type."); | ||
return reinterpret_cast<ResultType&>(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just use memcpy
here directly.
} | ||
#else | ||
// UB? | ||
uint16_t* address_as_converter = reinterpret_cast<uint16_t*>(&out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can't be done without either UB, or falling back to omp critical
.
ee4be45
to
3037d52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part 1 / 2 of my review. So far, I only have small comments.
option(GINKGO_ENABLE_HALF "Enable the use of half precision" ON) | ||
# We do not support MSVC. SYCL will come later | ||
if(MSVC OR GINKGO_BUILD_SYCL) | ||
message(STATUS "HALF is not supported in MSVC, and later support in SYCL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be rephrased since I really don't know what you mean by "and later support in SYCL".
Do you mean that SYCL does support half-precision in a later version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will enable the support from #1710
As the half is trivial copy again now, we might not need the device_type mapping though.
struct device_numeric_limits<__half> { | ||
// from __half documentation, it accepts unsigned short | ||
// __half does not have constexpr | ||
static GKO_ATTRIBUTES GKO_INLINE auto inf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also work for host code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the constructor are available on both side.
side note, the operations are not.
static constexpr auto inf() { return std::numeric_limits<T>::infinity(); } | ||
static constexpr auto max() { return std::numeric_limits<T>::max(); } | ||
static constexpr auto min() { return std::numeric_limits<T>::min(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you made these into functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__half
does not have constexpr constructor such that I can not put them into static constexpr data here.
I do not try the following such that keep them as data yet.
If we still make it as the data not functions, the definition is on the host.
Can cuda still use them in device code? I doubt not because we do not pass it through the kernel call.
Thus, I made them into functions.
common/cuda_hip/base/math.hpp
Outdated
} | ||
|
||
|
||
// Dircetly call float versrion from here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos:
// Dircetly call float versrion from here? | |
// Directly call float version from here? |
Also, why is abs
specialized, while sqrt
above is a separate function? Because that's how thrust
does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sqrt can also be specialized
common/cuda_hip/base/math.hpp
Outdated
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host | ||
// compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get the meaning, maybe:
// It is required by NVHPC 23.3, isnan is undefined when NVHPC are only as host | |
// compiler. | |
// It is required by NVHPC 23.3, `isnan` is undefined when NVHPC is only used as a host | |
// compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly,
I think cuda will go through the code twice. one for device and the other for the rest.
NVCC does not complain anything, but NVHPC will complain isnan is not defined.
TBH, I forgot whether I put __device__
or not when I encounter this issue.
I will check again
THRUST_HALF_FRIEND_OPERATOR(+, +=) | ||
THRUST_HALF_FRIEND_OPERATOR(-, -=) | ||
THRUST_HALF_FRIEND_OPERATOR(*, *=) | ||
THRUST_HALF_FRIEND_OPERATOR(/, /=) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this macro afterward? If not, maybe just undef
the macro.
exec, | ||
[] GKO_KERNEL(auto idx, auto array) { | ||
if constexpr (std::is_same_v<remove_complex<ValueType>, half>) { | ||
// __half can not be from int64_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that?
That half can't be converted to int64_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, __half
can not be converted from int64_t
.
cuda only writes the conversion from short
, int
, long long
and the corresponding unsigned
version.
Unfortuntately, it does not accepts int64_t even if long long
and int64_t
are the same technically.
Error: The following files need to be formatted:
You can find a formatting patch under Artifacts here or run |
This PR adds the matrix and components (like arrary/device_matrix_data) with half precision support
Also, to avoid touch the files, which are not related to this PR, I add several type list with half additionally.
For example, RealValueTypes -> RealValueTypesWithHalf and next_precision -> next_precision_with_half
(We will add bfloat16 in the future, so maybe do not use Half)
for the friend and corresponding function
If we only use next in friend and function
Moreover, the second one does not work when we fallback the next_precision_with_half to next_precision by disabling half because next<next<value_type>> is value_type without half now. However, the first one always work.
TODO:
add as_device_type to sycl for gko::half <-> sycl::halfmove to Sycl device_type mapping #1710