Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement execution::unseq. #1111

Merged
merged 3 commits into from
Aug 2, 2020
Merged

Conversation

BillyONeal
Copy link
Member

Implement execution::unseq. Resolves GH-44.

<execution>

  • Add sequenced_policy and unseq.
  • Mark sequenced_policy as being an execution policy.
  • Add detection for this new policy to std::for_each and std::for_each_n, and use #pragma loop(ivdep) when supplied. We are not marking other algorithms because all other algorithms have something that makes the operative loop body not actually independent and the docs for #pragma loop(ivdep) suggest that is not allowed.
  • Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2.

<yvals_core.h>

  • Mark proposal as implemented and change __cpp_lib_execution when C++20 is turned on.

instantiate_algorithms.hpp:

  • Add unseq to execution policy matricies.

P0024R2_parallel_algorithms_for_each:

  • Add testing for unseq.

VSO_0157762_feature_test_macros:

  • Update test for new value of __cpp_lib_execution.

<execution>
* Add sequenced_policy and unseq.
* Mark sequenced_policy as being an execution policy.
* Add detection for this new policy to std::for_each and std::for_each_n, and use #pragma loop(ivdep) when supplied. We are not marking other algorithms because all other algorithms have something that makes the operative loop body not actually independent and the docs for #pragma loop(ivdep) suggest that is not allowed.
* Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2.

<yvals_core.h>
* Mark proposal as implemented and change __cpp_lib_execution when C++20 is turned on.

instantiate_algorithms.hpp:
* Add unseq to execution policy matricies.

P0024R2_parallel_algorithms_for_each:
* Add testing for unseq.

VSO_0157762_feature_test_macros:
* Update test for new value of __cpp_lib_execution.
@BillyONeal BillyONeal added the cxx20 C++20 feature label Jul 30, 2020
@BillyONeal BillyONeal requested a review from a team as a code owner July 30, 2020 02:09
@CaseyCarter
Copy link
Member

  • Remove #pragma loop(ivdep) from std::transform because transform is callable such that _Dest == _First1 or _Dest == _First2.

I've always found this "only perfect overlap" requirement weird. Do you think the perf benefit of loop(ivdep) is worth having codegen with and without, and switching on the value of _Dest == _Firstx at runtime?

@BillyONeal
Copy link
Member Author

I've always found this "only perfect overlap" requirement weird. Do you think the perf benefit of loop(ivdep) is worth having codegen with and without, and switching on the value of _Dest == _Firstx at runtime?

It could go either way. If it is a thing the compiler could autovectorize, the wins are huge, and the dependency analysis with 3 ranges is likely to be considered to much for the autovectorizer so without the pragma we probably won't get it. But if it isn't a thing that could be autovectorized doubling the code size of the function is likely to result in the whole function not being inlined and thus overall worse ecosystem perf.

As a result I would prefer to make tuning changes like that if and only if we are requested to do so by the optimizer team, and they have not made such a request. I think I just marked those in error back when I implemented the '17 algorithms.

Note that we still use ivdep in for reduce but there we only engage it for arithmetic pointer inputs where we can try them all and observe benefit.

stl/inc/yvals_core.h Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
stl/inc/execution Show resolved Hide resolved
stl/inc/execution Show resolved Hide resolved
stl/inc/execution Outdated Show resolved Hide resolved
@AlexGuteniev
Copy link
Contributor

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

@CaseyCarter
Copy link
Member

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

@BillyONeal
Copy link
Member Author

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

I don't see what you're saying the docs suggest about needing hint_parallel there. They're explicitly listed as 3 separate independent options and I see no indication that they must be used together.

Here's a demo showing that it works: https://gcc.godbolt.org/z/P8Yjn3

@CaseyCarter
Copy link
Member

Does #pragma loop( ivdep ) work without #pragma loop( hint_parallel( n ) ) ?
Does #pragma loop( ivdep ) work in default release configuration (doesn't take extra switches) ?

That was certainly our understanding after communicating with the optimizer folks, but the docs seem to suggest otherwise. Billy-san, should we file a doc issue?

I don't see what you're saying the docs suggest about needing hint_parallel there. They're explicitly listed as 3 separate independent options and I see no indication that they must be used together.

The description of ivdep on that doc page:

ivdep
A hint to the compiler to ignore vector dependencies for this loop. Use this option together with hint_parallel.

implies that ivdep can't be used without hint_parallel.

@CaseyCarter CaseyCarter removed their assignment Aug 1, 2020
@StephanTLavavej StephanTLavavej self-assigned this Aug 1, 2020
@StephanTLavavej StephanTLavavej merged commit 8ec6b33 into microsoft:master Aug 2, 2020
@StephanTLavavej
Copy link
Member

Thanks for implement:smile_cat:ing this policy!

(this message was written in a slightly unsequenced order)

@AlexGuteniev
Copy link
Contributor

implies that ivdep can't be used without hint_parallel.

I went ahead with fix PR MicrosoftDocs/cpp-docs#2340

@BillyONeal BillyONeal deleted the vec branch August 24, 2020 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cxx20 C++20 feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P1001R2 execution::unseq
4 participants