[Hexagon] Add HVX quant conv2d implementation #13256

quic-sanirudh · 2022-11-01T17:32:48Z

This patch adds a new HVX intrinsic implementation to perform quantized convolution.

It assumes that the qnn.conv2d relay op is not
canonicalized and all the quantization parameters (scales and zero points) are passed into the intrinsic implementation.

It also uses the fixed point computation function defined in hexagon topi utils to compute a fixed point (combined) scale which is used to perform the final requantization before returning the quantized output.

tvm-bot · 2022-11-01T17:32:52Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Icemist, @mehrdadh _{See #10317 for details}
Built docs for commit 71a8993 can be found here.

_{Generated by tvm-bot}

quic-sanirudh · 2022-11-04T17:26:10Z

@csullivan @cconvey Gentle ping for a review. This is an initial quant implementation of conv2d, similar to the fp16 version that I wrote earlier.

mehrdadh · 2022-11-04T20:35:55Z

tests/python/contrib/test_hexagon/topi/test_conv2d_quant_intrin.py

@@ -0,0 +1,262 @@
+# Licensed to the Apache Software Foundation (ASF) under one


This file will have lint issues once #13271 merged. Please fix them in the meantime

I've fixed the line issue, thanks.

quic-sanirudh · 2022-11-06T05:37:03Z

@tvm-bot rerun

quic-sanirudh · 2022-11-07T16:49:14Z

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

driazati · 2022-11-07T18:38:14Z

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

driazati · 2022-11-07T18:38:57Z

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

quic-sanirudh · 2022-11-08T06:25:14Z

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

Great, thanks a lot for the help.

ibsidorenko · 2022-11-08T07:32:20Z

tests/python/contrib/test_hexagon/topi/test_conv2d_quant_intrin.py

+        ref_out_q = ref_out_q.reshape(ref_out.shape)
+
+        final_scale = act_scale * wgt_scale / out_scale
+        fixed_final_scale, scale_factor = get_fixed_point_value(final_scale)


Hi, @quic-sanirudh ! Thank you for this PR, rather interesting work. Just one small question:
As I see for Hexagon target we use int16 dtype to represent fixed point values (dtype param in get_fixed_point_value). But in TVM we use int32 dtype for that (for example for scale parameter in requantize). Can this somehow affect the accuracy of the real life quantized models?

Hi @ibsidorenko, thanks for the review. The int16 dtype was chosen so that the arithmetic for re-quantization can happen in int32, which reduces the number of instructions, but yes the accuracy could be affected. I haven't tested this on real world models yet, but that was the reason for setting a very tight rtol/atol values for assertion in the test case.

I also tried to break the accuracy of int16 fixed point computation by initializing the random inputs to extreme ranges and getting the scale values in the order of 0.0001 to 1000 (which was well beyond any scale values I saw in real life models), and the test still passed with the expected accuracy.

I plan to verify this on real world models and see how the accuracy is affected (if at all) and if needed I can update the patch to use int32 fixed point values instead.

quic-sanirudh · 2022-11-10T13:24:52Z

@csullivan @cconvey When you get a chance, could you review this PR. This is a quantized conv2d similar to the fp16 conv2d that I wrote earlier.

mehrdadh

LGTM! I'll wait for @cconvey and @csullivan to take a look

quic-sanirudh · 2022-11-17T04:52:20Z

@csullivan @cconvey Could you please help in reviewing this patch when you get a chance, thanks.

quic-sanirudh · 2022-11-28T13:19:23Z

@csullivan @cconvey Could you please review this PR or suggest someone who could be the right person to review it. Thanks.

janetsc · 2022-11-28T16:56:11Z

src/runtime/hexagon/ops/conv2d.h

+inline constexpr int yxc_to_sm_8b(int y, int x, int c) {
+  // Map y,x,c coordinates within a block to the offset (in 8-bit elements)
+  // from the beginning of the block in spatial-major layout.
+  // 10-bit spatial mask: yyyxxxccccc


Add a check to make sure only the bits we expect are set in the inputs - for y and x only the lowest 3 bits and c only 5 bits

I consciously avoided the checks here because these functions are used for indexing within the innermost loops and need to be really fast. I actually was planning to remove the check from the above yxc_to_sm_16b function as well.

I thought I had was to add the assert statements and then disable them for release builds with #define NDEBUG. Not sure if there's a better solution.

I'm pretty sure we can rely on assert being disabled when CMAKE_BUILD_TYPE=Release. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.

Another option is to check the loop bounds in the caller to make sure y, x and c can't get bigger than can be expressed. (And put a comment here to that effect - that it is the caller's responsibility to check on release builds.)

I've added the asserts directly inside the index functions that would be disabled with Release builds.

I thought about adding it in the outer loops as you suggested, but that anyways is guaranteed with the current code as block_height/block_width/block_depth is expected to be the size of the blocks and for other uses of the index functions, it is the responsibility of the caller anyway.

Agreed - this is much safer!

janetsc · 2022-11-28T16:56:41Z

src/runtime/hexagon/ops/conv2d.h

+  // beginning of the chunk in spatial-major layout.
+  // Spatial mask: p..piiioooooii, where p..p are position bits.
+  int p = y * width + (width - 1 - x);
+  return p << 10 | (i & 0x1c) << 5 | o << 2 | (i & 3);


Suggest similar bounds checking here.

Same comment as above. I can probably add asserts if we can disable them for release builds later

janetsc · 2022-11-28T17:00:33Z

src/runtime/hexagon/ops/conv2d_quant_hvx.cc

+                                   int xi, int ci, const DLTensor& block) {
+  auto block_ptr =
+      tvm::runtime::hexagon::conv_utils::nhwc_at(block, 0, block_out_y, block_out_x, block_out_c);
+  auto block_offset = yi * 256 + xi * 32 + ci;


Suggest defining consts for these. Are they derived from the supported shape?

The same comment applies to all constants below as well.

I'll do that @janetsc thanks. Right now they're assuming the activation blocks to be 8x8x32.

janetsc

I meant to say "Approve" earlier today - my suggestions are only cosmetic. (Although the bounds checking would be a good idea if you do another iteration.)

This patch adds a new HVX intrinsic implementation to perform quantized convolution. It assumes that the qnn.conv2d relay op is not canonicalized and all the quantization parameters (scales and zero points) are passed into the intrinsic implementation. It also uses the fixed point computation function defined in hexagon topi utils to compute a fixed point (combined) scale which is used to perform the final requantization before returning the quantized output.

cconvey

Just a few minor suggestions.

cconvey · 2022-11-29T13:31:46Z

cmake/modules/Hexagon.cmake

+
+  set_source_files_properties(
+    "${TVMRT_SOURCE_DIR}/hexagon/ops/conv2d_quant_hvx.cc"
+    PROPERTIES COMPILE_FLAGS "-mhvx"


Are we confident that -mhvx is supported by all of the compilers that might build this code?

I'm assuming that typically the clang provided by Hexagon Toolchain will be used. But I'm a little fuzzy about the intended level of support for other compilers, e.g. a user-supplied build of Clang/LLVM.

Would it make sense to update src/runtime/hexagon/README.md to clarify the version(s) of LLVM that support flags like -mhvx?

Or alternatively, use CMake's CheckCXXCompilerFlag function to see if -mhvx is supported, and only use that flag if it is?

Thanks for the review @cconvey.

I can add the details in the README or add a CMake check, but the -mhvx flag was added to clang all the way back in 2017 in LLVM 6.0 release if not earlier, which predates the entire TVM project, so we can also probably assume safely that the -mhvx flag will be available for practically anyone building the TVM project now.

If you think it might still be better to add the check or the README change, please let me know which one you think makes more sense and I can make that change. Thanks

That makes total sense, I didn't realize -mhvx support went back that far. I agree that there's no need for any additional documentation or checking.

cconvey · 2022-11-29T13:46:03Z

src/runtime/hexagon/ops/conv2d.h

+inline constexpr int yxc_to_sm_8b(int y, int x, int c) {
+  // Map y,x,c coordinates within a block to the offset (in 8-bit elements)
+  // from the beginning of the block in spatial-major layout.
+  // 10-bit spatial mask: yyyxxxccccc


I'm pretty sure we can rely on assert being disabled when CMAKE_BUILD_TYPE=Release. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.

cconvey · 2022-11-29T13:51:36Z

src/runtime/hexagon/ops/conv2d.h

@@ -133,7 +155,48 @@ inline uintptr_t hwio_at(const DLTensor& f, int y, int x, int i, int o) {
 * @param width
 * @param depth
 */
-void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int depth);
+template <typename T, int block_height, int block_width, int block_depth>
+void blockize_hwc(void* out, void* inp_flat, int height, int width, int depth) {


Would it make sense for inp_flat's type to be const void* rather than void*?

This is probably a bit of a stylistic choice; I just figured I'd ask.

I agree with both, I'll add the asserts and the const void* for the arguments, thanks.

cconvey · 2022-11-29T13:52:20Z

src/runtime/hexagon/ops/conv2d.h

@@ -144,7 +207,42 @@ void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int dept
 * @param width
 * @param depth
 */
-void deblockize_hwc_16b(void* out_flat, void* inp, int height, int width, int depth);
+template <typename T, int block_height, int block_width, int block_depth>
+void deblockize_hwc(void* out_flat, void* inp, int height, int width, int depth) {


Would it make sense for the type of inp to be const void*?

I'll add the const void*, it makes sense, thanks.

cconvey

Added a question / comment about use of "inline".

cconvey · 2022-11-29T14:46:17Z

src/runtime/hexagon/ops/conv2d.h

@@ -75,15 +77,31 @@ inline void* to_ptr(uintptr_t v) { return reinterpret_cast<void*>(v); }

 inline uintptr_t to_uint(void* ptr) { return reinterpret_cast<uintptr_t>(ptr); }

-constexpr int xyc_to_sm_16b(int y, int x, int c) {
+inline constexpr int yxc_to_sm_16b(int y, int x, int c) {


Is the addition of inline here (and elsewhere in the PR) necessary?

From https://en.cppreference.com/w/cpp/language/constexpr:

A constexpr specifier used in a function or static data member (since C++17) declaration implies inline.

ah okay, I did not realize this change was made. Looks like an addition that was inserted by working with the downstream repo, where this inline probably exists. I'll remove it, it makes sense, thanks.

cconvey

LGTM!

quic-sanirudh force-pushed the conv-hvx-quant branch 4 times, most recently from 7048369 to 64314c4 Compare November 4, 2022 09:05

mehrdadh requested changes Nov 4, 2022

View reviewed changes

quic-sanirudh force-pushed the conv-hvx-quant branch from 64314c4 to 40865fd Compare November 5, 2022 16:25

quic-sanirudh force-pushed the conv-hvx-quant branch from 40865fd to 15ee9bb Compare November 7, 2022 04:59

ibsidorenko reviewed Nov 8, 2022

View reviewed changes

mehrdadh reviewed Nov 10, 2022

View reviewed changes

janetsc reviewed Nov 28, 2022

View reviewed changes

janetsc approved these changes Nov 28, 2022

View reviewed changes

csullivan approved these changes Nov 29, 2022

View reviewed changes

csullivan requested a review from mehrdadh November 29, 2022 05:22

quic-sanirudh force-pushed the conv-hvx-quant branch from 15ee9bb to 49fe19e Compare November 29, 2022 13:50

cconvey suggested changes Nov 29, 2022

View reviewed changes

Remove inline keywords and add debug asserts

71a8993

janetsc approved these changes Nov 29, 2022

View reviewed changes

cconvey approved these changes Nov 29, 2022

View reviewed changes

cconvey approved these changes Dec 1, 2022

View reviewed changes

mehrdadh approved these changes Dec 1, 2022

View reviewed changes

mehrdadh merged commit bf16b42 into apache:main Dec 1, 2022

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

		@@ -0,0 +1,262 @@
		# Licensed to the Apache Software Foundation (ASF) under one

[Hexagon] Add HVX quant conv2d implementation #13256

[Hexagon] Add HVX quant conv2d implementation #13256

Conversation

quic-sanirudh commented Nov 1, 2022

tvm-bot commented Nov 1, 2022 • edited Loading

quic-sanirudh commented Nov 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-sanirudh commented Nov 6, 2022

quic-sanirudh commented Nov 7, 2022

driazati commented Nov 7, 2022

driazati commented Nov 7, 2022

quic-sanirudh commented Nov 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-sanirudh commented Nov 10, 2022

mehrdadh left a comment

Choose a reason for hiding this comment

quic-sanirudh commented Nov 17, 2022

quic-sanirudh commented Nov 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetsc Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetsc left a comment

Choose a reason for hiding this comment

cconvey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey left a comment

Choose a reason for hiding this comment

tvm-bot commented Nov 1, 2022 •

edited

Loading

janetsc Nov 29, 2022 •

edited

Loading