Dist transpiler support prefetch #9714

jacquesqiao · 2018-04-07T16:07:26Z

project: #9597
task list: #9211
test code: https://github.com/jacquesqiao/models/tree/dist-lookup-table/dist_lookup_table

remain problem:

prefetch block has to be at that last, or RunPreparedContext will fail.

… dist-transpiler-support-prefetch

Yancey1989 · 2018-04-10T10:48:42Z

paddle/fluid/operators/concat_op.cc

@@ -33,7 +35,7 @@ class ConcatOp : public framework::OperatorWithKernel {
    size_t axis = static_cast<size_t>(ctx->Attrs().Get<int>("axis"));
    const size_t n = ins.size();

-    PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1.");
+    //    PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1.");


May delete the comment.

Yancey1989 · 2018-04-10T10:49:29Z

paddle/fluid/operators/listen_and_serv_op.cc

  }
-  auto prepared = executor.Prepare(*program, block_list);
+  auto optimize_prepared = executor.Prepare(*program, block_list);


We need to prepare all the blocks of the program, so maybe the name prepared is more suitable?

optimize_prepared is used to be different with prefetch_prepared

Yancey1989 · 2018-04-10T10:51:13Z

paddle/fluid/operators/listen_and_serv_op.cc


  rpc_service_->SetScope(&recv_scope);
  rpc_service_->SetDevCtx(&dev_ctx);
  // TODO(qiao) set proper fields for table lookup and update
  rpc_service_->SetExecutor(&executor);
-  rpc_service_->SetPrefetchBlkdId(0);
+  VLOG(3) << "prefetch block id is " << prefetch_block->ID();
+  auto prefetch_prepared = executor.Prepare(*program, prefetch_block->ID());


The code L106 have already prepared all the blocks, so we don't need to prepare the prefetch_block again.

I skipped the prefetch_block here https://github.com/PaddlePaddle/Paddle/pull/9714/files#diff-64ee97d744659db61dc8ae72bfc103b5R102

Yancey1989 · 2018-04-10T10:54:36Z

paddle/fluid/operators/prefetch_op.cc

@@ -71,7 +71,7 @@ class PrefetchOpMaker : public framework::OpProtoAndCheckerMaker {
              "(RPCClient) The RPC client object which will be"
              "initialized at most once.");
    AddOutput("Out",
-              "(SelectedRows) result "
+              "(LoDTensor) result "


I think the type of Output variable is SelectedRows, just because the shape was not a certain value.

Here it should be LoDTensor, because the following op is not certain, most of them can only process LoDTensor, SelectedRows is constructed when backward.

Yancey1989 · 2018-04-10T10:56:30Z

paddle/fluid/operators/sum_op.cc

@@ -36,8 +38,8 @@ class SumOp : public framework::OperatorWithKernel {
    }

    auto x_dims = ctx->GetInputsDim("X");
-    size_t N = x_dims.size();
-    PADDLE_ENFORCE_GT(N, 1, "Input tensors count should > 1.");
+    // size_t N = x_dims.size();


Please delete these comments.

add TODO here, maybe this check need to add back in the future.

Yancey1989 · 2018-04-10T10:59:29Z

python/paddle/fluid/distribute_transpiler.py

@@ -252,12 +315,114 @@ def transpile(self,
                outputs={"Out": [orig_param]},
                attrs={"axis": 0})

+        if self.has_distributed_lookup_table:


Can we move these following code into an independent function?

Yancey1989 · 2018-04-10T11:04:00Z

Awesome! Thanks for PR and make it work!

… dist-transpiler-support-prefetch

typhoonzero · 2018-04-12T02:30:33Z

paddle/fluid/framework/operator.cc

@@ -55,7 +55,7 @@ static DDim GetDims(const Scope& scope, const std::string& name) {
  if (var->IsType<LoDTensor>()) {
    return var->Get<LoDTensor>().dims();
  } else if (var->IsType<SelectedRows>()) {
-    return var->Get<SelectedRows>().GetCompleteDims();
+    return var->Get<SelectedRows>().value().dims();


Will this affect other places like optimization ops?

ok, will optimize this code.

typhoonzero · 2018-04-12T02:56:29Z

python/paddle/fluid/distribute_transpiler.py

+        # 2. add split_ids_op and send_vars_op to send gradient to pservers
+        # there should only be one table_name
+        all_ops = program.global_block().ops
+        table_grad_name = framework.grad_var_name(self.table_name)


grad_var_name sometimes may not get the "real" grad var name, for backward may create a different name.

Yes, here the name of the table parameter's gradient will always be table_name@GRAD, the table_name@GRAD@RENAME name will be merged into table_name@GRAD.

I see, thanks

typhoonzero

LGTM!

jacquesqiao added 10 commits April 5, 2018 16:00

init

edcfcad

add some check

66ab88a

add dist transpile logic

29174df

add insert op for block

54656a1

init change get_pserver_program

171560b

Merge branch 'develop' into dist-transpiler-support-prefetch

3605922

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d3f2d4c

… dist-transpiler-support-prefetch

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6973bcb

… dist-transpiler-support-prefetch

optimize code

38ed3e8

fix a bug

eb31b66

jacquesqiao mentioned this pull request Apr 8, 2018

Support Distribute Lookup Table #9211

Closed

15 tasks

jacquesqiao added 10 commits April 8, 2018 18:35

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b4e974a

… dist-transpiler-support-prefetch

can run now

d672592

start to do table split

2e69b77

start to process table gradient

3ad3eea

complete pserver part

a07a063

can send_vars now

53d6459

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e0fca82

… dist-transpiler-support-prefetch

revert cpplint

b1e398d

fix a bug

cf9d25f

optimize code

064a913

jacquesqiao changed the title ~~[WIP]Dist transpiler support prefetch~~ Dist transpiler support prefetch Apr 10, 2018

Yancey1989 reviewed Apr 10, 2018

View reviewed changes

jacquesqiao requested a review from typhoonzero April 10, 2018 11:18

jacquesqiao added 4 commits April 10, 2018 23:25

move dist test to models

f81d6b3

revert the interface of distribute_transpiler.transpile

f467b18

fix prefetch_block

4b8189f

optimize trainspiler code

d1c8f4b

jacquesqiao added 10 commits April 11, 2018 15:04

Merge branch 'develop' into dist-transpiler-support-prefetch

9d3ecca

add comment to sum_op

dff691c

add warning log

bb27df1

fix comment

356b9e6

fix test_send_recv

2f4962d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e2674e8

… dist-transpiler-support-prefetch

fix test_send_recv

063a956

Merge branch 'develop' into dist-transpiler-support-prefetch

fde5445

fix train with no distributed table

8eea574

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

193be56

… dist-transpiler-support-prefetch

typhoonzero reviewed Apr 12, 2018

View reviewed changes

optimize GetDims

4554e7b

typhoonzero approved these changes Apr 12, 2018

View reviewed changes

jacquesqiao merged commit 4c55a60 into PaddlePaddle:develop Apr 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dist transpiler support prefetch #9714

Dist transpiler support prefetch #9714

jacquesqiao commented Apr 7, 2018 •

edited

Loading

Yancey1989 Apr 10, 2018

Yancey1989 Apr 10, 2018

jacquesqiao Apr 11, 2018

Yancey1989 Apr 10, 2018

jacquesqiao Apr 11, 2018

Yancey1989 Apr 10, 2018

jacquesqiao Apr 11, 2018

Yancey1989 Apr 10, 2018

jacquesqiao Apr 11, 2018

Yancey1989 Apr 10, 2018

jacquesqiao Apr 11, 2018

Yancey1989 commented Apr 10, 2018

typhoonzero Apr 12, 2018

jacquesqiao Apr 12, 2018

jacquesqiao Apr 12, 2018

typhoonzero Apr 12, 2018

jacquesqiao Apr 12, 2018 •

edited

Loading

typhoonzero Apr 12, 2018

typhoonzero left a comment

Dist transpiler support prefetch #9714

Dist transpiler support prefetch #9714

Conversation

jacquesqiao commented Apr 7, 2018 • edited Loading

remain problem:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yancey1989 commented Apr 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao Apr 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

jacquesqiao commented Apr 7, 2018 •

edited

Loading

jacquesqiao Apr 12, 2018 •

edited

Loading