Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor: need multiple thread support #6319

Closed
helinwang opened this issue Dec 6, 2017 · 6 comments
Closed

Executor: need multiple thread support #6319

helinwang opened this issue Dec 6, 2017 · 6 comments
Assignees

Comments

@helinwang
Copy link
Contributor

helinwang commented Dec 6, 2017

Currently our Executor implementation is single threaded: it runs the ProgramDesc (actually ProgramDescBind, unrelated detail) sequentially.

It will have severe performance problem with the ProgramDesc that contains OPs that load data from disk or send/recv OPs that reads data from the network. The I/O will block the computation that could run in parallel.

The Executor needs to be able to analyze the dependency according to ProgramDesc (or ExecutionPlan in future if this PR passes), and schedule the OPs whose dependencies are finished to the thread-pool for parallel execution.

This issue is blocked by #6317 , since the Executor needs a reliable data structure (ProgramDesc, not ProgramDescBind) to do dependency analysis.

@QiJune
Copy link
Member

QiJune commented Dec 6, 2017

I think that we will have a pool of DeviceContext. What Executor do is to take a DeviceContext from the pool to run the operator under the guide of device in operator. The Executor have to handle the dependency between operators. And all operators are launched asynchronizedly.

@Yancey1989
Copy link
Contributor

Yancey1989 commented Dec 6, 2017

As discuess at #6223 , we will convert the graph which user defined to a multi-thread graph as(omit the vars):
multi-threads

Note that not all the operators will be run with mutli-threads, and the problem is how the executor know which Op will be execute as multi-threads, we can solve this problem as the following:

  1. Modify the BlockDesc as a recursive schemes

    message BlockDesc {
    ...
    repeated OpDesc ops = 4;
    repeated BlockDesc sub_blocks = 5;
    }
  2. The Executor will do the level-order traversal of the Block tree, and push the Ops into the ThreadPool to run, and the pseudo-code:

    for block in program.blocks:
        for op in block.ops:
            op->Run();
        if block.sub_blocks:
            for level in block.sub_blocks.size():
                for sub_block in block.sub_blocks:
                    ThreadPool.Push(sub_block.ops[level]);
                ThreadPool.SyncRun();

@helinwang
Copy link
Contributor Author

@Yancey1989 I think the executor execute all OP in the thread pool, so all of them are executed with multi-threads.

@helinwang
Copy link
Contributor Author

@QiJune

all operators are launched asynchronizedly.

I think at least some OP can not run asynchronous, for example: OP that loads data from disk can not run asynchronous with the next OP that uses the data.

Maybe asynchronous is just a special case for GPU OP, because GPU stream handles the synchronization?

@helinwang
Copy link
Contributor Author

Thread pool concept, IO thread, computation thread.

@helinwang
Copy link
Contributor Author

We are already discussing concurrency design. This issue is out-of-date, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants