Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation documentation of the dynamic RNN #7135

Closed
wants to merge 8 commits into from

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented Jan 2, 2018

No description provided.


## A glance of Dynamic RNN

A common neural network structure called recurrent neural network(`RNN` for short), which there is a directed circle in the neural network model. RNN can use a internal memory to process arbitrary sequences of inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a internal memory --> an internal memory
process arbitrary sequences of inputs is not clear enough. Maybe you mean:
RNN can use an internal memory to process sequences with variable lengths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


A common neural network structure called recurrent neural network(`RNN` for short), which there is a directed circle in the neural network model. RNN can use a internal memory to process arbitrary sequences of inputs.

PaddlePaddle Fluid directly represents the `directed circle` in the `ProgramDesc`, since we do not use directed acyclic graph to represent our model. The `ProgramDesc` just like the AST of a programming language, which describes the computation instructions for training a neural network. We use arrays and a while loop to describe the training process of an RNN. The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in `ProgramDesc`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training process --> training/inference process
The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in ProgramDesc --> The C++ code below demonstrates the forward logic of RNN generated in ProgramDesc of PaddlePaddle Fluid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


1. Control flow operators
1. Data manipulation operators of RNN.
2. Backward of RNN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Data manipulation operators of RNN.
2. Backward of RNN.

-->

2. Data manipulation operators of RNN
3. Backward of RNN


### WhileOp

The primary control flow operator to implement dynamic RNN is `WhileOp`. The `WhileOp` takes a sub-block. The operators in the sub-block will be executed again and again while the condition is true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WhileOp takes a sub-block. --> The WhileOp holds a sub-block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The while operator has two kinds of inputs. They are

* Condition: A bool scalar. When it's False, the While Op will be terminated. Note that this scalar should always be in CPU memory.
* The condition variable is in the external block. However, it should be updated inside the sub-block of while op unless it is an endless loop. The condition variable will be an output variable of the while operator, too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the external block you mean in the parent block ?
unless it is an endless loop. --> otherwise it would result to an endless loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Condition: A bool scalar. When it's False, the While Op will be terminated. Note that this scalar should always be in CPU memory.
* The condition variable is in the external block. However, it should be updated inside the sub-block of while op unless it is an endless loop. The condition variable will be an output variable of the while operator, too.
* X: The external inputs variables, which are required by operators inside the block of While Op.
* For example, if there is a hidden fully-connected layer in while operator. The input of the fully-connected layer is calculated by another operator inside the while operator. The input of this fully-connected layer is not the `external` inputs of the while operator. However, weight tensors of this fully-connected layer are external outputs of the while operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input of the fully-connected layer is calculated by another operator inside the while operator. --> The input of the fully-connected layer is output of another operator inside the while operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@guoshengCS
Copy link
Contributor

Something bothers me when I am reading dynamicRNN. I guess one advantage that we use with block is to use the outer variables, however the inserted operators like lod_tensor_to_array in dynamicRNN lead to reorder the samples, while the variables outside the with block mostly keep the raw sample order. Though we can reorder the outside variables with reorder_lod_tensor_by_rank explicitly in the dynamicRNN block, from this point of view, we nearly can not use any outer variable directly in the dynamicRNN block, which seems to deviate from the with block. I am not sure if there are ways to implicitly reorder in dynamicRNN so we can use outer variables directly or should the will-be-used variables be indicated when initializing dynamicRNN.

@reyoung
Copy link
Collaborator Author

reyoung commented Jan 2, 2018

@guoshengCS
reorder_lod_tensor_by_rank should be wrapped in the DynamicRNN implicitly. DynamicRNN is just a syntax sugar for end users.

```cpp
auto input = LoDTensor(...); // LoDTensor is the data structure for time series

std::vector<LoDTensor> inputs_for_each_timestep = LoDTensorToTimesteps(LoDTensor())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoDTensor() is input? If so, please use input instead.

outputs_for_each_timestep[i] = sum;
}

LoDTensor outputs = TimestepsToLoDTensor(outputs_for_each_timestep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think memories is also an output, however the type is vector, need convert it to LoDTensor ?


* Output: The output variables. They are `assigned` or `push_back` by the operators inside the block of While Op.
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward.
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do not need when backwards --> they are not required in backward stage.
The backward control flow operator of ++i is --i. --> The corresponding control flow operator of ++i in backward stage is --i.

* Output: The output variables. They are `assigned` or `push_back` by the operators inside the block of While Op.
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward.
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`.
* The step-scopes. A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which --> whose
equals --> equals to

* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward.
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`.
* The step-scopes. A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step.
* A potential optimization of `while operator` when inference is just maintaining one step of scope in while operator since there is no backward stage when inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of --> for


```

There are several corner cases of gradient implementation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The followings seem not corner cases. I think it's better to call tips.

The `++i` is the increment operator as a control flow operator. There are several differences between the computational `a = a + 1` and the control flow operator `++i`.

1. `IncrementOp` can only be run on CPU. And it should only be run on CPU.
2. The corresponding operator in the backward stage of `++i` is `--i`, because for the for loop, the data access should be reverse. The gradient of `++i` is not needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse --> reversed

@paddle-bot-old paddle-bot-old bot closed this May 22, 2020
@paddle-bot-old
Copy link

Since you haven't replied for a long time, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您长期未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants