<execution>: cache thread::hardware_concurrency #1134

AlexGuteniev · 2020-08-03T05:24:11Z

Currently does not cache hardware_concurrency value, like it used to in certain configurations.

GetNativeSystemInfo may be a system call, so it may defeat the speedup obtained by parallel algorithms.

Consider caching it again.

Note that since number of CPUs may change at runtime, it may be a good idea to refresh the value after some GetTickCount64 interval.

The text was updated successfully, but these errors were encountered:

AlexGuteniev · 2020-08-03T05:31:03Z

I don't know whether it should be fixed in _Thrd_hardware_concurrency or in __std_parallel_algorithms_hw_threads, and also whether GetTickCount should actually be used (or maybe threadpool refresher timer function be started)

BillyONeal · 2020-08-04T05:27:58Z

I think we would need a perf benchmark showing this as a substantial problem before we would want to change this; if I understand correctly GetNativeSystemInfo is implemented by a simple memory read from a memory page mapped into both userspace and the kernel (as readonly in userspace and read write in the kernel). But that's anecdotal, I've not actually checked.

Back when I did perf optimizations for the parallel algorithms the cost of GetNativeSystemInfo was never high enough on the chart to care about. But if someone can produce a benchmark suggesting otherwise I'd be all for it.

AlexGuteniev · 2020-08-04T05:43:45Z

Profiling has shown that GetNativeSystemInfo always calls ZwQuerySystemInformation and ZwQueryInformationProcess, both are kernel calls. The benchmark is just run this:

int main() {
    for (;;) {
        SYSTEM_INFO si {};
        GetNativeSystemInfo(&si);
    }
}

Under a profiler, and see OS calls as hotspots.

resolves microsoft#1134 (conservatively) revert to what there were before atomic wait

CaseyCarter · 2020-08-04T16:12:45Z

Note that since number of CPUs may change at runtime, it may be a good idea to refresh the value after some GetTickCount64 interval.

Should we have a separate issue tracking updating the value so it doesn't get lost in the shuffle once this issue is closed?

AlexGuteniev · 2020-08-04T16:17:56Z

Should we have a separate issue tracking updating the value so it doesn't get lost in the shuffle once this issue is closed?

I think the decision can be made in the current PR to do or not to do this; my default take is not to do this

AlexGuteniev · 2020-08-04T16:38:34Z

Reasons not to do timed refresh:

Least surprise principle
Either lack of knob to tune time, or having to introduce such knob
Nihilistic reason: we cannot take into account affinity anyway, see More precise std::hardware_concurrency #594

AlexGuteniev · 2020-08-04T16:51:03Z

What I would do if we want to deal with dynamic CPU count change:

Add __set_parallel_agorithms_hw_concurrency, taking integer
Small positive value will set number of threads
Invalid value (say 0 or -1) will reset to GetNativeSystemInfo, thus also doing a refresh

BillyONeal · 2020-08-04T21:35:11Z

I think anyone who cares about hot plugging CPUs cares enough to call the platform topology enumeration APIs and similar since they probably want to support CPUs outside of the default group. As a result I don't think we should worry about that.

AlexGuteniev · 2020-08-05T07:55:35Z

I think anyone who cares about hot plugging CPUs cares enough to call the platform topology enumeration APIs and similar since they probably want to support CPUs outside of the default group. As a result I don't think we should worry about that.

And if group apis are called before thread::hardware_concurrency, parallel algorithm may not parallel anything, since the value may be zero. Guess if we really care about this case, should at least provide a setter.

AlexGuteniev · 2020-08-07T06:33:07Z

And if group apis are called before thread::hardware_concurrency, parallel algorithm may not parallel anything, since the value may be zero. Guess if we really care about this case, should at least provide a setter.

Decided against it. Thread pool is not actually controlled anyway.

Fixes #1134. Co-authored-by: Casey Carter <cartec69@gmail.com> Co-authored-by: Billy O'Neal <bion@microsoft.com>

StephanTLavavej added the performance Must go faster label Aug 3, 2020

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Aug 4, 2020

cache HW thread count in <execution>

3d8706f

resolves microsoft#1134 (conservatively) revert to what there were before atomic wait

AlexGuteniev mentioned this issue Aug 4, 2020

Cache hardware thread count for <execution> #1143

Merged

StephanTLavavej closed this as completed in #1143 Aug 9, 2020

StephanTLavavej pushed a commit that referenced this issue Aug 9, 2020

Cache hardware thread count for <execution> (#1143)

392fb6d

Fixes #1134. Co-authored-by: Casey Carter <cartec69@gmail.com> Co-authored-by: Billy O'Neal <bion@microsoft.com>

StephanTLavavej added the fixed Something works now, yay! label Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<execution>: cache thread::hardware_concurrency #1134

<execution>: cache thread::hardware_concurrency #1134

AlexGuteniev commented Aug 3, 2020

AlexGuteniev commented Aug 3, 2020

BillyONeal commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

CaseyCarter commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

BillyONeal commented Aug 4, 2020

AlexGuteniev commented Aug 5, 2020

AlexGuteniev commented Aug 7, 2020

<execution>: cache thread::hardware_concurrency #1134

<execution>: cache thread::hardware_concurrency #1134

Comments

AlexGuteniev commented Aug 3, 2020

AlexGuteniev commented Aug 3, 2020

BillyONeal commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

CaseyCarter commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

AlexGuteniev commented Aug 4, 2020

BillyONeal commented Aug 4, 2020

AlexGuteniev commented Aug 5, 2020

AlexGuteniev commented Aug 7, 2020