Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the fast inverse test throughput-limited rather than latency-limited #7958

Merged
merged 2 commits into from
Nov 28, 2023

Conversation

abadams
Copy link
Member

@abadams abadams commented Nov 21, 2023

This test is currently failing on a Cortex a76 buildbot, because it's a recursive update definition so it ends up limited by instruction latencies rather than throughputs. On an a76 (which is a reasonable CPU to assume for a generic ARM target), if you multiply by a fast inverse the total latency is frecpe + frecps + fmul = 11, whereas the Cortex a76 optimization guide says the latency of an fdiv instruction is 7-10. The cycle costs (sum of inverse throughput) however, are 3 and 8 respectively, so fast_inverse is still a good idea for most imaging workloads that aren't the goofy recursive thing in the test. So hopefully if I just change the test to be thoughput-limited, it'll fix it.

Still disabled on M1, because fdiv there has a throughput of 1?!

@abadams
Copy link
Member Author

abadams commented Nov 21, 2023

This does indeed fix that test on the new arm bot (though another test is still failing)

@abadams abadams merged commit 5175d16 into main Nov 28, 2023
14 of 17 checks passed
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
…ited (halide#7958)

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants