Skip to content

Comments

Make the fast inverse test throughput-limited rather than latency-limited#7958

Merged
abadams merged 2 commits intomainfrom
abadams/make_fast_inverse_test_throughput_limited
Nov 28, 2023
Merged

Make the fast inverse test throughput-limited rather than latency-limited#7958
abadams merged 2 commits intomainfrom
abadams/make_fast_inverse_test_throughput_limited

Conversation

@abadams
Copy link
Member

@abadams abadams commented Nov 21, 2023

This test is currently failing on a Cortex a76 buildbot, because it's a recursive update definition so it ends up limited by instruction latencies rather than throughputs. On an a76 (which is a reasonable CPU to assume for a generic ARM target), if you multiply by a fast inverse the total latency is frecpe + frecps + fmul = 11, whereas the Cortex a76 optimization guide says the latency of an fdiv instruction is 7-10. The cycle costs (sum of inverse throughput) however, are 3 and 8 respectively, so fast_inverse is still a good idea for most imaging workloads that aren't the goofy recursive thing in the test. So hopefully if I just change the test to be thoughput-limited, it'll fix it.

Still disabled on M1, because fdiv there has a throughput of 1?!

@abadams
Copy link
Member Author

abadams commented Nov 21, 2023

This does indeed fix that test on the new arm bot (though another test is still failing)

@abadams abadams merged commit 5175d16 into main Nov 28, 2023
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
…ited (halide#7958)

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants