Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

Merged

Conversation

ibsidorenko
Copy link
Contributor

@ibsidorenko ibsidorenko commented Aug 31, 2022

This commit adds high-performance implementation of fixed_point_multiply
operation based on Hexagon intrinsics for vmpye/vmpyo instructions.

Benchmarking of fixed_point_multiply op with (1,8,56,56,32) input
tensor on Qualcomm SM8350:

  • default implementation: 10.06 ms
  • optimized implementation: 1.42 ms
  • speedup: 7x times (!!!)

Please note that this is introducing a small round-up error for some
corner cases with negative shift argument (The same as for ARM CPU, see
PR#5980). This is because we are rounding twice instead than only once:

  • original q_multiply_shift: round(xy2^-s)
  • hexagon q_multiply_shift: round(round(x*y)*2^-s)

cc @mehrdadh

This commit adds high-performance implementation of fixed_point_multiply
operation based on Hexagon intrinsics for vmpye/vmpyo instructions.

Benchmarking of 'fixed_point_multiply' op with (1,8,56,56,32) input
tensor on Qualcomm SM8350:
  * default implementation: 10.06 ms
  * optimized implementation: 1.42 ms
  * speedup: 7x times (!!!)

Please note that this is introducing a small round-up error for some
corner cases with negative shift argument (The same as for ARM CPU, see
PR#5980). This is because we are rounding twice instead than only once:
  * original q_multiply_shift: round(x*y*2^-s)
  * hexagon q_multiply_shift: round(round(x*y)*2^-s)
@masahi
Copy link
Member

masahi commented Aug 31, 2022

cc @kparzysz-quic

@tmoreau89
Copy link
Contributor

Excellent, thank you for the contribution @ibsidorenko !

)

# Select depending on the shift
return tvm.tir.Select(shift < 0, out_negative_shift, mul_o_2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mul_o_2 is just round(x*y). There is no shift in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... Maybe I misunderstood the question, but I put shift separately before mul_o_2:
x = x * (1 << (shift))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread it as a part of the comment...

Copy link
Contributor

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks.

@kparzysz-quic kparzysz-quic merged commit 038f15b into apache:main Sep 1, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…he#12659)

This commit adds high-performance implementation of fixed_point_multiply
operation based on Hexagon intrinsics for vmpye/vmpyo instructions.

Benchmarking of 'fixed_point_multiply' op with (1,8,56,56,32) input
tensor on Qualcomm SM8350:
  * default implementation: 10.06 ms
  * optimized implementation: 1.42 ms
  * speedup: 7x times (!!!)

Please note that this is introducing a small round-up error for some
corner cases with negative shift argument (The same as for ARM CPU, see
PR#5980). This is because we are rounding twice instead than only once:
  * original q_multiply_shift: round(x*y*2^-s)
  * hexagon q_multiply_shift: round(round(x*y)*2^-s)
@ibsidorenko ibsidorenko deleted the hexagon-fixed-point-multiply branch March 29, 2023 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants