Use OpenMP-like synchronization patterns in Eigen thread pool #4236

tlh20 · 2020-06-15T13:26:56Z

Description:

This PR updates the thread pool implementation to make work distribution over the Eigen thread pool more closely resemble techniques used in OpenMP. In particular:

(1) A thread entering a parallel loop works on the iterations itself, rather than requiring a thread switch to/from a thread in the pool, if called from outside the thread pool.

(2) To support #1, work items pushed to the thread pool run a loop to claim iterations from a shared counter via atomic-fetch-and-add, as opposed to having work items themselves represent individual batches of iterations. This means that any thread working on the loop can execute any batch of iterations, including having the main thread run through all of the batches itself if the loop turns out to be short-running.

(3) As with OpenMP active scheduling, the worker loop spins waiting for work prior to blocking. This avoids OS blocking / wake-up paths in workloads with series of short-running parallel sections. The default spinning duration prior to blocking is measured at around 1ms.

Performance tests on a 32-vCPU VM for CPU inference workloads show performance broadly similar to OpenMP builds, with p50 across 143 models a 12% improvement, and p80 a 28% improvement.

Motivation and Context

The PR aims to simplify the configuration of threading with ORT by providing consistent performance with OpenMP-based parallelism.

…lock_size

ghost · 2020-06-15T13:27:08Z

All CLA requirements met.

onnxruntime/core/common/threadpool.cc

include/onnxruntime/core/platform/threadpool.h

yuslepukhin · 2020-06-15T22:36:40Z

cv_.notify_all();

Would it improve things somewhat if we notified after releasing the mutex?

Refers to: include/onnxruntime/core/platform/Barrier.h:41 in fe2637f. [](commit_id = fe2637f, deletion_comment = False)

yuslepukhin

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/platform/threadpool.h

onnxruntime/core/common/threadpool.cc

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

onnxruntime/core/common/threadpool.cc

tlh20 · 2020-06-16T13:43:07Z

cv_.notify_all();
Would it improve things somewhat if we notified after releasing the mutex?

Refers to: include/onnxruntime/core/platform/Barrier.h:41 in fe2637f. [](commit_id = fe2637f, deletion_comment = False)

I'll leave this as-is for the moment, as per the current main branch, but it would be interesting to check how this performs on different platforms. I think some mutex + condvar implementations will identify that the waiting threads require the lock that is still held, and defer any work until the lock is released, while others benefited from avoiding the notification going to a thread that gets unblocked briefly before blocking on the lock-acquire.

pranavsharma

Can you please change the subject of this PR as it'll show up as-is in the commit logs? Thanks!

snnn · 2020-06-16T19:36:36Z

Can you please change the subject of this PR as it'll show up as-is in the commit logs? Thanks!

The log message can be changed with merging the PR.

onnxruntime/core/common/threadpool.cc

snnn and others added 7 commits June 11, 2020 08:45

threading

be89f9d

Bias work distribution toward one item per thread

997cf66

Unbreak debug builds

6ddf34f

Unbreak debug builds

4bcf795

Sharded loop counter does not require iteration to be a multiple of b…

850c239

…lock_size

Exit spin loop on done_ flag

4604628

Whitespace clean-up

07cbf2f

tlh20 requested a review from a team as a code owner June 15, 2020 13:26

onnxruntime::make_unique instead of std::make_unique

fe2637f

snnn reviewed Jun 15, 2020

View reviewed changes

onnxruntime/core/common/threadpool.cc Outdated Show resolved Hide resolved

snnn previously approved these changes Jun 15, 2020

View reviewed changes

pranavsharma reviewed Jun 15, 2020

View reviewed changes

include/onnxruntime/core/platform/threadpool.h Outdated Show resolved Hide resolved

yuslepukhin previously approved these changes Jun 15, 2020

View reviewed changes

skottmckay reviewed Jun 16, 2020

View reviewed changes

pranavsharma reviewed Jun 16, 2020

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

onnxruntime/core/common/threadpool.cc Outdated Show resolved Hide resolved

Updates for PR comments

c19cf71

tlh20 dismissed stale reviews from yuslepukhin and snnn via c19cf71 June 16, 2020 11:22

Fix Linux builds

d08b3db

tlh20 closed this Jun 16, 2020

tlh20 reopened this Jun 16, 2020

pranavsharma previously approved these changes Jun 16, 2020

View reviewed changes

snnn previously approved these changes Jun 16, 2020

View reviewed changes

skottmckay reviewed Jun 16, 2020

View reviewed changes

onnxruntime/core/common/threadpool.cc Show resolved Hide resolved

skottmckay previously approved these changes Jun 16, 2020

View reviewed changes

mrry mentioned this pull request Jun 16, 2020

Locking seems to be taking the share of the time #4251

Closed

Use _MSC_VER in ifdef, rather than absence of __GNUC__

0eafefc

tlh20 dismissed stale reviews from skottmckay, snnn, and pranavsharma via 0eafefc June 17, 2020 08:29

tlh20 changed the title ~~Tim/threading~~ Use OpenMP-like synchronization patterns in Eigen thread pool Jun 17, 2020

skottmckay approved these changes Jun 17, 2020

View reviewed changes

Merge from master for CI updates

d043bfe

tlh20 merged commit 9e3b5c6 into master Jun 22, 2020

tlh20 deleted the tim/threading branch June 22, 2020 09:04

tlh20 mentioned this pull request Jul 13, 2020

Create N-1 threads in intra-op pool, given main thread now active #4493

Merged

tlh20 mentioned this pull request Apr 20, 2021

Remove non-trivially-destructible thread-local from thread pool state, blocking ARM64 builds #4336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OpenMP-like synchronization patterns in Eigen thread pool #4236

Use OpenMP-like synchronization patterns in Eigen thread pool #4236

tlh20 commented Jun 15, 2020

ghost commented Jun 15, 2020 •

edited by ghost

Loading

yuslepukhin commented Jun 15, 2020

yuslepukhin left a comment

tlh20 commented Jun 16, 2020

pranavsharma left a comment

snnn commented Jun 16, 2020

Use OpenMP-like synchronization patterns in Eigen thread pool #4236

Use OpenMP-like synchronization patterns in Eigen thread pool #4236

Conversation

tlh20 commented Jun 15, 2020

ghost commented Jun 15, 2020 • edited by ghost Loading

yuslepukhin commented Jun 15, 2020

yuslepukhin left a comment

Choose a reason for hiding this comment

tlh20 commented Jun 16, 2020

pranavsharma left a comment

Choose a reason for hiding this comment

snnn commented Jun 16, 2020

ghost commented Jun 15, 2020 •

edited by ghost

Loading