Make the fast inverse test throughput-limited rather than latency-limited #7958

abadams · 2023-11-21T21:28:44Z

This test is currently failing on a Cortex a76 buildbot, because it's a recursive update definition so it ends up limited by instruction latencies rather than throughputs. On an a76 (which is a reasonable CPU to assume for a generic ARM target), if you multiply by a fast inverse the total latency is frecpe + frecps + fmul = 11, whereas the Cortex a76 optimization guide says the latency of an fdiv instruction is 7-10. The cycle costs (sum of inverse throughput) however, are 3 and 8 respectively, so fast_inverse is still a good idea for most imaging workloads that aren't the goofy recursive thing in the test. So hopefully if I just change the test to be thoughput-limited, it'll fix it.

Still disabled on M1, because fdiv there has a throughput of 1?!

…ited

abadams · 2023-11-21T23:32:43Z

This does indeed fix that test on the new arm bot (though another test is still failing)

…mited

…ited (halide#7958) Co-authored-by: Steven Johnson <srj@google.com>

Make the fast inverse test throughput-limited rather than latency-lim…

2fb0ccd

…ited

vksnk approved these changes Nov 21, 2023

View reviewed changes

Merge branch 'main' into abadams/make_fast_inverse_test_throughput_li…

6dfdc2f

…mited

abadams merged commit 5175d16 into main Nov 28, 2023
14 of 17 checks passed

BrewTestBot mentioned this pull request Feb 2, 2024

halide 17.0.0 Homebrew/homebrew-core#161602

Closed

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

Make the fast inverse test throughput-limited rather than latency-lim…

3a932ea

…ited (halide#7958) Co-authored-by: Steven Johnson <srj@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the fast inverse test throughput-limited rather than latency-limited #7958

Make the fast inverse test throughput-limited rather than latency-limited #7958

abadams commented Nov 21, 2023

abadams commented Nov 21, 2023

Make the fast inverse test throughput-limited rather than latency-limited #7958

Make the fast inverse test throughput-limited rather than latency-limited #7958

Conversation

abadams commented Nov 21, 2023

abadams commented Nov 21, 2023