-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement execution::unseq. #1111
Conversation
<execution> * Add sequenced_policy and unseq. * Mark sequenced_policy as being an execution policy. * Add detection for this new policy to std::for_each and std::for_each_n, and use #pragma loop(ivdep) when supplied. We are not marking other algorithms because all other algorithms have something that makes the operative loop body not actually independent and the docs for #pragma loop(ivdep) suggest that is not allowed. * Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2. <yvals_core.h> * Mark proposal as implemented and change __cpp_lib_execution when C++20 is turned on. instantiate_algorithms.hpp: * Add unseq to execution policy matricies. P0024R2_parallel_algorithms_for_each: * Add testing for unseq. VSO_0157762_feature_test_macros: * Update test for new value of __cpp_lib_execution.
I've always found this "only perfect overlap" requirement weird. Do you think the perf benefit of |
It could go either way. If it is a thing the compiler could autovectorize, the wins are huge, and the dependency analysis with 3 ranges is likely to be considered to much for the autovectorizer so without the pragma we probably won't get it. But if it isn't a thing that could be autovectorized doubling the code size of the function is likely to result in the whole function not being inlined and thus overall worse ecosystem perf. As a result I would prefer to make tuning changes like that if and only if we are requested to do so by the optimizer team, and they have not made such a request. I think I just marked those in error back when I implemented the '17 algorithms. Note that we still use ivdep in for reduce but there we only engage it for arithmetic pointer inputs where we can try them all and observe benefit. |
Does |
That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue? |
I don't see what you're saying the docs suggest about needing hint_parallel there. They're explicitly listed as 3 separate independent options and I see no indication that they must be used together. Here's a demo showing that it works: https://gcc.godbolt.org/z/P8Yjn3 |
The description of ivdep on that doc page:
implies that ivdep can't be used without hint_parallel. |
Thanks for implement:smile_cat:ing this policy! (this message was written in a slightly unsequenced order) |
I went ahead with fix PR MicrosoftDocs/cpp-docs#2340 |
Implement execution::unseq. Resolves GH-44.
<execution>
<yvals_core.h>
instantiate_algorithms.hpp:
P0024R2_parallel_algorithms_for_each:
VSO_0157762_feature_test_macros: