-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation documentation of the dynamic RNN #7135
Conversation
|
||
## A glance of Dynamic RNN | ||
|
||
A common neural network structure called recurrent neural network(`RNN` for short), which there is a directed circle in the neural network model. RNN can use a internal memory to process arbitrary sequences of inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a internal memory --> an internal memory
process arbitrary sequences of inputs
is not clear enough. Maybe you mean:
RNN can use an internal memory to process sequences with variable lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is quoted from https://en.wikipedia.org/wiki/Recurrent_neural_network
doc/design/dynamic_rnn.md
Outdated
|
||
A common neural network structure called recurrent neural network(`RNN` for short), which there is a directed circle in the neural network model. RNN can use a internal memory to process arbitrary sequences of inputs. | ||
|
||
PaddlePaddle Fluid directly represents the `directed circle` in the `ProgramDesc`, since we do not use directed acyclic graph to represent our model. The `ProgramDesc` just like the AST of a programming language, which describes the computation instructions for training a neural network. We use arrays and a while loop to describe the training process of an RNN. The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in `ProgramDesc`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training process --> training/inference process
The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in ProgramDesc
--> The C++ code below demonstrates the forward logic of RNN generated in ProgramDesc
of PaddlePaddle Fluid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/dynamic_rnn.md
Outdated
|
||
1. Control flow operators | ||
1. Data manipulation operators of RNN. | ||
2. Backward of RNN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Data manipulation operators of RNN.
2. Backward of RNN.
-->
2. Data manipulation operators of RNN
3. Backward of RNN
doc/design/dynamic_rnn.md
Outdated
|
||
### WhileOp | ||
|
||
The primary control flow operator to implement dynamic RNN is `WhileOp`. The `WhileOp` takes a sub-block. The operators in the sub-block will be executed again and again while the condition is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WhileOp
takes a sub-block. --> The WhileOp
holds a sub-block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/dynamic_rnn.md
Outdated
The while operator has two kinds of inputs. They are | ||
|
||
* Condition: A bool scalar. When it's False, the While Op will be terminated. Note that this scalar should always be in CPU memory. | ||
* The condition variable is in the external block. However, it should be updated inside the sub-block of while op unless it is an endless loop. The condition variable will be an output variable of the while operator, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the external block
you mean in the parent block
?
unless it is an endless loop.
--> otherwise it would result to an endless loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/dynamic_rnn.md
Outdated
* Condition: A bool scalar. When it's False, the While Op will be terminated. Note that this scalar should always be in CPU memory. | ||
* The condition variable is in the external block. However, it should be updated inside the sub-block of while op unless it is an endless loop. The condition variable will be an output variable of the while operator, too. | ||
* X: The external inputs variables, which are required by operators inside the block of While Op. | ||
* For example, if there is a hidden fully-connected layer in while operator. The input of the fully-connected layer is calculated by another operator inside the while operator. The input of this fully-connected layer is not the `external` inputs of the while operator. However, weight tensors of this fully-connected layer are external outputs of the while operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input of the fully-connected layer is calculated by another operator inside the while operator. --> The input of the fully-connected layer is output of another operator inside the while operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Something bothers me when I am reading |
@guoshengCS |
```cpp | ||
auto input = LoDTensor(...); // LoDTensor is the data structure for time series | ||
|
||
std::vector<LoDTensor> inputs_for_each_timestep = LoDTensorToTimesteps(LoDTensor()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoDTensor() is input? If so, please use input instead.
outputs_for_each_timestep[i] = sum; | ||
} | ||
|
||
LoDTensor outputs = TimestepsToLoDTensor(outputs_for_each_timestep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think memories
is also an output, however the type is vector, need convert it to LoDTensor ?
|
||
* Output: The output variables. They are `assigned` or `push_back` by the operators inside the block of While Op. | ||
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward. | ||
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do not need when backwards --> they are not required in backward stage.
The backward control flow operator of ++i
is --i
. --> The corresponding control flow operator of ++i
in backward stage is --i
.
* Output: The output variables. They are `assigned` or `push_back` by the operators inside the block of While Op. | ||
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward. | ||
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`. | ||
* The step-scopes. A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which --> whose
equals --> equals to
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward. | ||
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`. | ||
* The step-scopes. A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step. | ||
* A potential optimization of `while operator` when inference is just maintaining one step of scope in while operator since there is no backward stage when inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of --> for
|
||
``` | ||
|
||
There are several corner cases of gradient implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The followings seem not corner cases. I think it's better to call tips.
The `++i` is the increment operator as a control flow operator. There are several differences between the computational `a = a + 1` and the control flow operator `++i`. | ||
|
||
1. `IncrementOp` can only be run on CPU. And it should only be run on CPU. | ||
2. The corresponding operator in the backward stage of `++i` is `--i`, because for the for loop, the data access should be reverse. The gradient of `++i` is not needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse --> reversed
Since you haven't replied for a long time, we have closed this issue/pr. |
No description provided.