-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Add HVX quant conv2d implementation #13256
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
7048369
to
64314c4
Compare
@csullivan @cconvey Gentle ping for a review. This is an initial quant implementation of conv2d, similar to the fp16 version that I wrote earlier. |
@@ -0,0 +1,262 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will have lint issues once #13271 merged. Please fix them in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the line issue, thanks.
64314c4
to
40865fd
Compare
@tvm-bot rerun |
40865fd
to
15ee9bb
Compare
CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today |
1 similar comment
CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today |
Great, thanks a lot for the help. |
ref_out_q = ref_out_q.reshape(ref_out.shape) | ||
|
||
final_scale = act_scale * wgt_scale / out_scale | ||
fixed_final_scale, scale_factor = get_fixed_point_value(final_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @quic-sanirudh ! Thank you for this PR, rather interesting work. Just one small question:
As I see for Hexagon target we use int16 dtype to represent fixed point values (dtype
param in get_fixed_point_value
). But in TVM we use int32 dtype for that (for example for scale parameter in requantize). Can this somehow affect the accuracy of the real life quantized models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ibsidorenko, thanks for the review. The int16
dtype was chosen so that the arithmetic for re-quantization can happen in int32
, which reduces the number of instructions, but yes the accuracy could be affected. I haven't tested this on real world models yet, but that was the reason for setting a very tight rtol
/atol
values for assertion in the test case.
I also tried to break the accuracy of int16
fixed point computation by initializing the random inputs to extreme ranges and getting the scale values in the order of 0.0001
to 1000
(which was well beyond any scale values I saw in real life models), and the test still passed with the expected accuracy.
I plan to verify this on real world models and see how the accuracy is affected (if at all) and if needed I can update the patch to use int32 fixed point values instead.
@csullivan @cconvey When you get a chance, could you review this PR. This is a quantized conv2d similar to the fp16 conv2d that I wrote earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll wait for @cconvey and @csullivan to take a look
@csullivan @cconvey Could you please help in reviewing this patch when you get a chance, thanks. |
@csullivan @cconvey Could you please review this PR or suggest someone who could be the right person to review it. Thanks. |
inline constexpr int yxc_to_sm_8b(int y, int x, int c) { | ||
// Map y,x,c coordinates within a block to the offset (in 8-bit elements) | ||
// from the beginning of the block in spatial-major layout. | ||
// 10-bit spatial mask: yyyxxxccccc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a check to make sure only the bits we expect are set in the inputs - for y and x only the lowest 3 bits and c only 5 bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consciously avoided the checks here because these functions are used for indexing within the innermost loops and need to be really fast. I actually was planning to remove the check from the above yxc_to_sm_16b
function as well.
I thought I had was to add the assert
statements and then disable them for release builds with #define NDEBUG
. Not sure if there's a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we can rely on assert
being disabled when CMAKE_BUILD_TYPE=Release
. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to check the loop bounds in the caller to make sure y, x and c can't get bigger than can be expressed. (And put a comment here to that effect - that it is the caller's responsibility to check on release builds.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the asserts directly inside the index functions that would be disabled with Release builds.
I thought about adding it in the outer loops as you suggested, but that anyways is guaranteed with the current code as block_height/block_width/block_depth
is expected to be the size of the blocks and for other uses of the index functions, it is the responsibility of the caller anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - this is much safer!
// beginning of the chunk in spatial-major layout. | ||
// Spatial mask: p..piiioooooii, where p..p are position bits. | ||
int p = y * width + (width - 1 - x); | ||
return p << 10 | (i & 0x1c) << 5 | o << 2 | (i & 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest similar bounds checking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. I can probably add asserts if we can disable them for release builds later
int xi, int ci, const DLTensor& block) { | ||
auto block_ptr = | ||
tvm::runtime::hexagon::conv_utils::nhwc_at(block, 0, block_out_y, block_out_x, block_out_c); | ||
auto block_offset = yi * 256 + xi * 32 + ci; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest defining consts for these. Are they derived from the supported shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment applies to all constants below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that @janetsc thanks. Right now they're assuming the activation blocks to be 8x8x32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say "Approve" earlier today - my suggestions are only cosmetic. (Although the bounds checking would be a good idea if you do another iteration.)
This patch adds a new HVX intrinsic implementation to perform quantized convolution. It assumes that the qnn.conv2d relay op is not canonicalized and all the quantization parameters (scales and zero points) are passed into the intrinsic implementation. It also uses the fixed point computation function defined in hexagon topi utils to compute a fixed point (combined) scale which is used to perform the final requantization before returning the quantized output.
15ee9bb
to
49fe19e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor suggestions.
|
||
set_source_files_properties( | ||
"${TVMRT_SOURCE_DIR}/hexagon/ops/conv2d_quant_hvx.cc" | ||
PROPERTIES COMPILE_FLAGS "-mhvx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we confident that -mhvx
is supported by all of the compilers that might build this code?
I'm assuming that typically the clang provided by Hexagon Toolchain will be used. But I'm a little fuzzy about the intended level of support for other compilers, e.g. a user-supplied build of Clang/LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to update src/runtime/hexagon/README.md to clarify the version(s) of LLVM that support flags like -mhvx
?
Or alternatively, use CMake's CheckCXXCompilerFlag function to see if -mhvx
is supported, and only use that flag if it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @cconvey.
I can add the details in the README or add a CMake check, but the -mhvx
flag was added to clang all the way back in 2017 in LLVM 6.0 release if not earlier, which predates the entire TVM project, so we can also probably assume safely that the -mhvx
flag will be available for practically anyone building the TVM project now.
If you think it might still be better to add the check or the README change, please let me know which one you think makes more sense and I can make that change. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes total sense, I didn't realize -mhvx
support went back that far. I agree that there's no need for any additional documentation or checking.
inline constexpr int yxc_to_sm_8b(int y, int x, int c) { | ||
// Map y,x,c coordinates within a block to the offset (in 8-bit elements) | ||
// from the beginning of the block in spatial-major layout. | ||
// 10-bit spatial mask: yyyxxxccccc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we can rely on assert
being disabled when CMAKE_BUILD_TYPE=Release
. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.
@@ -133,7 +155,48 @@ inline uintptr_t hwio_at(const DLTensor& f, int y, int x, int i, int o) { | |||
* @param width | |||
* @param depth | |||
*/ | |||
void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int depth); | |||
template <typename T, int block_height, int block_width, int block_depth> | |||
void blockize_hwc(void* out, void* inp_flat, int height, int width, int depth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for inp_flat
's type to be const void*
rather than void*
?
This is probably a bit of a stylistic choice; I just figured I'd ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both, I'll add the asserts and the const void*
for the arguments, thanks.
@@ -144,7 +207,42 @@ void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int dept | |||
* @param width | |||
* @param depth | |||
*/ | |||
void deblockize_hwc_16b(void* out_flat, void* inp, int height, int width, int depth); | |||
template <typename T, int block_height, int block_width, int block_depth> | |||
void deblockize_hwc(void* out_flat, void* inp, int height, int width, int depth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for the type of inp
to be const void*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the const void*
, it makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a question / comment about use of "inline".
src/runtime/hexagon/ops/conv2d.h
Outdated
@@ -75,15 +77,31 @@ inline void* to_ptr(uintptr_t v) { return reinterpret_cast<void*>(v); } | |||
|
|||
inline uintptr_t to_uint(void* ptr) { return reinterpret_cast<uintptr_t>(ptr); } | |||
|
|||
constexpr int xyc_to_sm_16b(int y, int x, int c) { | |||
inline constexpr int yxc_to_sm_16b(int y, int x, int c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the addition of inline
here (and elsewhere in the PR) necessary?
From https://en.cppreference.com/w/cpp/language/constexpr:
A constexpr specifier used in a function or static data member (since C++17) declaration implies inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay, I did not realize this change was made. Looks like an addition that was inserted by working with the downstream repo, where this inline
probably exists. I'll remove it, it makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This patch adds a new HVX intrinsic implementation to perform quantized convolution.
It assumes that the qnn.conv2d relay op is not
canonicalized and all the quantization parameters (scales and zero points) are passed into the intrinsic implementation.
It also uses the fixed point computation function defined in hexagon topi utils to compute a fixed point (combined) scale which is used to perform the final requantization before returning the quantized output.