Implement execution::unseq. #1111

BillyONeal · 2020-07-30T02:09:09Z

Implement execution::unseq. Resolves GH-44.

Add sequenced_policy and unseq.
Mark sequenced_policy as being an execution policy.
Add detection for this new policy to std::for_each and std::for_each_n, and use #pragma loop(ivdep) when supplied. We are not marking other algorithms because all other algorithms have something that makes the operative loop body not actually independent and the docs for #pragma loop(ivdep) suggest that is not allowed.
Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2.

<yvals_core.h>

Mark proposal as implemented and change __cpp_lib_execution when C++20 is turned on.

instantiate_algorithms.hpp:

Add unseq to execution policy matricies.

P0024R2_parallel_algorithms_for_each:

Add testing for unseq.

VSO_0157762_feature_test_macros:

Update test for new value of __cpp_lib_execution.

<execution> * Add sequenced_policy and unseq. * Mark sequenced_policy as being an execution policy. * Add detection for this new policy to std::for_each and std::for_each_n, and use #pragma loop(ivdep) when supplied. We are not marking other algorithms because all other algorithms have something that makes the operative loop body not actually independent and the docs for #pragma loop(ivdep) suggest that is not allowed. * Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2. <yvals_core.h> * Mark proposal as implemented and change __cpp_lib_execution when C++20 is turned on. instantiate_algorithms.hpp: * Add unseq to execution policy matricies. P0024R2_parallel_algorithms_for_each: * Add testing for unseq. VSO_0157762_feature_test_macros: * Update test for new value of __cpp_lib_execution.

CaseyCarter · 2020-07-30T02:35:02Z

Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2.

I've always found this "only perfect overlap" requirement weird. Do you think the perf benefit of loop(ivdep) is worth having codegen with and without, and switching on the value of _Dest == _Firstx at runtime?

BillyONeal · 2020-07-30T02:39:35Z

I've always found this "only perfect overlap" requirement weird. Do you think the perf benefit of loop(ivdep) is worth having codegen with and without, and switching on the value of _Dest == _Firstx at runtime?

It could go either way. If it is a thing the compiler could autovectorize, the wins are huge, and the dependency analysis with 3 ranges is likely to be considered to much for the autovectorizer so without the pragma we probably won't get it. But if it isn't a thing that could be autovectorized doubling the code size of the function is likely to result in the whole function not being inlined and thus overall worse ecosystem perf.

As a result I would prefer to make tuning changes like that if and only if we are requested to do so by the optimizer team, and they have not made such a request. I think I just marked those in error back when I implemented the '17 algorithms.

Note that we still use ivdep in for reduce but there we only engage it for arithmetic pointer inputs where we can try them all and observe benefit.

stl/inc/yvals_core.h

tests/std/tests/P0024R2_parallel_algorithms_for_each/test.cpp

stl/inc/execution

tests/std/tests/P0024R2_parallel_algorithms_for_each/test.cpp

AlexGuteniev · 2020-07-30T04:21:34Z

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

CaseyCarter · 2020-07-30T04:31:26Z

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

BillyONeal · 2020-07-31T20:51:25Z

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

I don't see what you're saying the docs suggest about needing hint_parallel there. They're explicitly listed as 3 separate independent options and I see no indication that they must be used together.

Here's a demo showing that it works: https://gcc.godbolt.org/z/P8Yjn3

CaseyCarter · 2020-08-01T04:04:25Z

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

I don't see what you're saying the docs suggest about needing hint_parallel there. They're explicitly listed as 3 separate independent options and I see no indication that they must be used together.

The description of ivdep on that doc page:

ivdep
A hint to the compiler to ignore vector dependencies for this loop. Use this option together with hint_parallel.

implies that ivdep can't be used without hint_parallel.

StephanTLavavej · 2020-08-02T00:07:24Z

Thanks for implement:smile_cat:ing this policy!

(this message was written in a slightly unsequenced order)

AlexGuteniev · 2020-08-02T03:18:42Z

implies that ivdep can't be used without hint_parallel.

I went ahead with fix PR MicrosoftDocs/cpp-docs#2340

BillyONeal added the cxx20 C++20 feature label Jul 30, 2020

BillyONeal requested a review from a team as a code owner July 30, 2020 02:09

Fixed flipped feature test macro.

d60c2db

StephanTLavavej requested changes Jul 30, 2020

View reviewed changes

CaseyCarter approved these changes Jul 30, 2020

View reviewed changes

STL and Casey CR comments.

460b621

BillyONeal requested review from StephanTLavavej and CaseyCarter July 31, 2020 23:19

StephanTLavavej approved these changes Aug 1, 2020

View reviewed changes

StephanTLavavej assigned CaseyCarter Aug 1, 2020

CaseyCarter approved these changes Aug 1, 2020

View reviewed changes

CaseyCarter removed their assignment Aug 1, 2020

StephanTLavavej self-assigned this Aug 1, 2020

StephanTLavavej merged commit 8ec6b33 into microsoft:master Aug 2, 2020

BillyONeal deleted the vec branch August 24, 2020 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement execution::unseq. #1111

Implement execution::unseq. #1111

BillyONeal commented Jul 30, 2020

CaseyCarter commented Jul 30, 2020

BillyONeal commented Jul 30, 2020

AlexGuteniev commented Jul 30, 2020

CaseyCarter commented Jul 30, 2020

BillyONeal commented Jul 31, 2020

CaseyCarter commented Aug 1, 2020

StephanTLavavej commented Aug 2, 2020

AlexGuteniev commented Aug 2, 2020

Implement execution::unseq. #1111

Implement execution::unseq. #1111

Conversation

BillyONeal commented Jul 30, 2020

CaseyCarter commented Jul 30, 2020

BillyONeal commented Jul 30, 2020

AlexGuteniev commented Jul 30, 2020

CaseyCarter commented Jul 30, 2020

BillyONeal commented Jul 31, 2020

CaseyCarter commented Aug 1, 2020

StephanTLavavej commented Aug 2, 2020

AlexGuteniev commented Aug 2, 2020