Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Add HVX quant conv2d implementation #13256

Merged
merged 2 commits into from
Dec 1, 2022

Conversation

quic-sanirudh
Copy link
Contributor

This patch adds a new HVX intrinsic implementation to perform quantized convolution.

It assumes that the qnn.conv2d relay op is not
canonicalized and all the quantization parameters (scales and zero points) are passed into the intrinsic implementation.

It also uses the fixed point computation function defined in hexagon topi utils to compute a fixed point (combined) scale which is used to perform the final requantization before returning the quantized output.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 1, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@quic-sanirudh quic-sanirudh force-pushed the conv-hvx-quant branch 4 times, most recently from 7048369 to 64314c4 Compare November 4, 2022 09:05
@quic-sanirudh
Copy link
Contributor Author

@csullivan @cconvey Gentle ping for a review. This is an initial quant implementation of conv2d, similar to the fp16 version that I wrote earlier.

@@ -0,0 +1,262 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will have lint issues once #13271 merged. Please fix them in the meantime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the line issue, thanks.

@quic-sanirudh
Copy link
Contributor Author

@tvm-bot rerun

@quic-sanirudh
Copy link
Contributor Author

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

@driazati
Copy link
Member

driazati commented Nov 7, 2022

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

1 similar comment
@driazati
Copy link
Member

driazati commented Nov 7, 2022

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

@quic-sanirudh
Copy link
Contributor Author

@driazati @mehrdadh The CI is stuck. Could you please let me know if there's any way to restart the CI. Would pining the tvm-bot work?

CI is finished now, so it should be good. I've been seeing some similar queueing issues (probably Jenkins wasn't able to spin up a machine to run the jobs for some reason), I opened #13312 and will be looking into this today

Great, thanks a lot for the help.

ref_out_q = ref_out_q.reshape(ref_out.shape)

final_scale = act_scale * wgt_scale / out_scale
fixed_final_scale, scale_factor = get_fixed_point_value(final_scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @quic-sanirudh ! Thank you for this PR, rather interesting work. Just one small question:
As I see for Hexagon target we use int16 dtype to represent fixed point values (dtype param in get_fixed_point_value). But in TVM we use int32 dtype for that (for example for scale parameter in requantize). Can this somehow affect the accuracy of the real life quantized models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ibsidorenko, thanks for the review. The int16 dtype was chosen so that the arithmetic for re-quantization can happen in int32, which reduces the number of instructions, but yes the accuracy could be affected. I haven't tested this on real world models yet, but that was the reason for setting a very tight rtol/atol values for assertion in the test case.

I also tried to break the accuracy of int16 fixed point computation by initializing the random inputs to extreme ranges and getting the scale values in the order of 0.0001 to 1000 (which was well beyond any scale values I saw in real life models), and the test still passed with the expected accuracy.

I plan to verify this on real world models and see how the accuracy is affected (if at all) and if needed I can update the patch to use int32 fixed point values instead.

@quic-sanirudh
Copy link
Contributor Author

@csullivan @cconvey When you get a chance, could you review this PR. This is a quantized conv2d similar to the fp16 conv2d that I wrote earlier.

Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll wait for @cconvey and @csullivan to take a look

@quic-sanirudh
Copy link
Contributor Author

@csullivan @cconvey Could you please help in reviewing this patch when you get a chance, thanks.

@quic-sanirudh
Copy link
Contributor Author

@csullivan @cconvey Could you please review this PR or suggest someone who could be the right person to review it. Thanks.

inline constexpr int yxc_to_sm_8b(int y, int x, int c) {
// Map y,x,c coordinates within a block to the offset (in 8-bit elements)
// from the beginning of the block in spatial-major layout.
// 10-bit spatial mask: yyyxxxccccc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check to make sure only the bits we expect are set in the inputs - for y and x only the lowest 3 bits and c only 5 bits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consciously avoided the checks here because these functions are used for indexing within the innermost loops and need to be really fast. I actually was planning to remove the check from the above yxc_to_sm_16b function as well.

I thought I had was to add the assert statements and then disable them for release builds with #define NDEBUG. Not sure if there's a better solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we can rely on assert being disabled when CMAKE_BUILD_TYPE=Release. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.

Copy link
Collaborator

@janetsc janetsc Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to check the loop bounds in the caller to make sure y, x and c can't get bigger than can be expressed. (And put a comment here to that effect - that it is the caller's responsibility to check on release builds.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the asserts directly inside the index functions that would be disabled with Release builds.

I thought about adding it in the outer loops as you suggested, but that anyways is guaranteed with the current code as block_height/block_width/block_depth is expected to be the size of the blocks and for other uses of the index functions, it is the responsibility of the caller anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this is much safer!

// beginning of the chunk in spatial-major layout.
// Spatial mask: p..piiioooooii, where p..p are position bits.
int p = y * width + (width - 1 - x);
return p << 10 | (i & 0x1c) << 5 | o << 2 | (i & 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest similar bounds checking here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. I can probably add asserts if we can disable them for release builds later

int xi, int ci, const DLTensor& block) {
auto block_ptr =
tvm::runtime::hexagon::conv_utils::nhwc_at(block, 0, block_out_y, block_out_x, block_out_c);
auto block_offset = yi * 256 + xi * 32 + ci;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest defining consts for these. Are they derived from the supported shape?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment applies to all constants below as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that @janetsc thanks. Right now they're assuming the activation blocks to be 8x8x32.

Copy link
Collaborator

@janetsc janetsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say "Approve" earlier today - my suggestions are only cosmetic. (Although the bounds checking would be a good idea if you do another iteration.)

@csullivan csullivan requested a review from mehrdadh November 29, 2022 05:22
This patch adds a new HVX intrinsic implementation to perform quantized
convolution.

It assumes that the qnn.conv2d relay op is not
canonicalized and all the quantization parameters (scales and zero
points) are passed into the intrinsic implementation.

It also uses the fixed point computation function defined in hexagon
topi utils to compute a fixed point (combined) scale which is used to
perform the final requantization before returning the quantized output.
Copy link
Contributor

@cconvey cconvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor suggestions.


set_source_files_properties(
"${TVMRT_SOURCE_DIR}/hexagon/ops/conv2d_quant_hvx.cc"
PROPERTIES COMPILE_FLAGS "-mhvx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident that -mhvx is supported by all of the compilers that might build this code?

I'm assuming that typically the clang provided by Hexagon Toolchain will be used. But I'm a little fuzzy about the intended level of support for other compilers, e.g. a user-supplied build of Clang/LLVM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to update src/runtime/hexagon/README.md to clarify the version(s) of LLVM that support flags like -mhvx?

Or alternatively, use CMake's CheckCXXCompilerFlag function to see if -mhvx is supported, and only use that flag if it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @cconvey.

I can add the details in the README or add a CMake check, but the -mhvx flag was added to clang all the way back in 2017 in LLVM 6.0 release if not earlier, which predates the entire TVM project, so we can also probably assume safely that the -mhvx flag will be available for practically anyone building the TVM project now.

If you think it might still be better to add the check or the README change, please let me know which one you think makes more sense and I can make that change. Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes total sense, I didn't realize -mhvx support went back that far. I agree that there's no need for any additional documentation or checking.

inline constexpr int yxc_to_sm_8b(int y, int x, int c) {
// Map y,x,c coordinates within a block to the offset (in 8-bit elements)
// from the beginning of the block in spatial-major layout.
// 10-bit spatial mask: yyyxxxccccc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we can rely on assert being disabled when CMAKE_BUILD_TYPE=Release. See https://stackoverflow.com/questions/34302265/does-cmake-build-type-release-imply-dndebug.

@@ -133,7 +155,48 @@ inline uintptr_t hwio_at(const DLTensor& f, int y, int x, int i, int o) {
* @param width
* @param depth
*/
void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int depth);
template <typename T, int block_height, int block_width, int block_depth>
void blockize_hwc(void* out, void* inp_flat, int height, int width, int depth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for inp_flat's type to be const void* rather than void*?

This is probably a bit of a stylistic choice; I just figured I'd ask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both, I'll add the asserts and the const void* for the arguments, thanks.

@@ -144,7 +207,42 @@ void blockize_hwc_16b(void* out, void* inp_flat, int height, int width, int dept
* @param width
* @param depth
*/
void deblockize_hwc_16b(void* out_flat, void* inp, int height, int width, int depth);
template <typename T, int block_height, int block_width, int block_depth>
void deblockize_hwc(void* out_flat, void* inp, int height, int width, int depth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for the type of inp to be const void*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the const void*, it makes sense, thanks.

Copy link
Contributor

@cconvey cconvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a question / comment about use of "inline".

@@ -75,15 +77,31 @@ inline void* to_ptr(uintptr_t v) { return reinterpret_cast<void*>(v); }

inline uintptr_t to_uint(void* ptr) { return reinterpret_cast<uintptr_t>(ptr); }

constexpr int xyc_to_sm_16b(int y, int x, int c) {
inline constexpr int yxc_to_sm_16b(int y, int x, int c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the addition of inline here (and elsewhere in the PR) necessary?

From https://en.cppreference.com/w/cpp/language/constexpr:

A constexpr specifier used in a function or static data member (since C++17) declaration implies inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, I did not realize this change was made. Looks like an addition that was inserted by working with the downstream repo, where this inline probably exists. I'll remove it, it makes sense, thanks.

Copy link
Contributor

@cconvey cconvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mehrdadh mehrdadh merged commit bf16b42 into apache:main Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants