Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty using Hexagon gather instructions #8445

Open
MihaiBabiac opened this issue Oct 27, 2024 · 0 comments
Open

Difficulty using Hexagon gather instructions #8445

MihaiBabiac opened this issue Oct 27, 2024 · 0 comments

Comments

@MihaiBabiac
Copy link

Hi there,

I've been trying to write some Halide code for performing image warping (somewhat similar to torch.grid_sample), but I'm repeatedly having issues in convincing the compiler to generate vgather instructions for the Hexagon DSP.

  1. There seems to be no way to specify that an input buffer is in the VTCM, meaning that Halide always has to do the allocation and copying itself.
  2. If the image is copied to the VTCM at root (compute_root()), parallelizing the output seems to break the gather instructions. If I'm reading the Hexagon HVX manual right, the gather ops have quite high latency, so not being able to hide the latency with parallelism can have a significant performance penalty.
  3. It seems that any "no-op" transformations applied to the input image, such as reinterpreting the values or reshaping also break the gathers.

If I avoid these 3 issues, I do manage to generate gather instructions, but I think points 1 and 2 hurt performance quite a bit. Without point 1. above, I could implement parallelism myself, splitting the output into horizontal slices before passing them to Halide, but I'm quite sure the code is memory bound, so having to copy the data each and every time defeats the purpose.

Am I doing something wrong, or are these currently limitations of the compiler? If so, any ideas how to work around them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant