Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stride pooling for seqlastin and seqfirstin #1701

Merged
merged 13 commits into from
Apr 13, 2017
Merged

stride pooling for seqlastin and seqfirstin #1701

merged 13 commits into from
Apr 13, 2017

Conversation

luotao1
Copy link
Contributor

@luotao1 luotao1 commented Mar 24, 2017

part one of #1608

  • 完成Average、Max和SequenceLastInstance的基类SequencePoolLayer,对stride参数的支持。
  • 添加Argument::poolSequenceWithStride函数,并对应添加该函数的单测test_argument.cpp。
  • 完成SequenceLastInstanceLayer对stride参数的支持,对应添加test_layerGrad中的单测。同时在trainer_config_helpers中修改first_seq和last_seq的接口,并完善python单测。

Average和Max的计算过程比SequenceLastInstance略复杂,会在下一个PR中进行完善。

@luotao1 luotao1 requested a review from lcy-seso March 24, 2017 06:46
@luotao1
Copy link
Contributor Author

luotao1 commented Mar 24, 2017

考虑到average和max的计算单元接口如下,都使用IVectorPtr来传递第二个参数:

void CpuMatrix::sequenceAvgForward(Matrix& a, const IVector& startsPos, int mode) ;
void maxSequenceForward(Matrix& input, const IVector& sequence, IVector& index);

因此,将std::vector<int> stridePositions_变量改成IVectorPtr stridePositions_,符合原先的两个接口形式。

@@ -67,6 +66,14 @@ void SequencePoolLayer::forward(PassType passType) {
<< "when trans_type = seq, input must hasSubseq";
output_.degradeSequence(input);
}
if (stride_ > 0) {
CHECK_EQ(input.hasSubseq(), 0UL)
<< "sequence stride pooling is not suitable for hasSubseq now";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not suitable --> is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2495,10 +2496,11 @@ def __init__(self,
**xargs)
config_assert(
len(inputs) == 1, 'SequenceLastInstanceLayer must have 1 input')
if trans_type == 'seq':
config_assert(stride == -1, 'subseq do not support stride window')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not --> does not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// empty sequence
tgtBuf[seqId + 1] = tgtBuf[seqId];
} else if (seqLength < stride) {
tgtBuf[seqId + 1] = tgtBuf[seqId] + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以不需要 else if (seqLength < stride) 这个条件分枝,计算逻辑只在 seqLength % stride 时,有差别。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已去掉该分支。

tgtBuf[seqId + 1] = tgtBuf[seqId] + 1;
} else {
tgtBuf[seqId + 1] = tgtBuf[seqId] + ceil((float)seqLength / stride);
int size =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 590 ~ 592 行代码可以简化,先计算size,tgtBuf[seqId + 1] = tgtBuf[seqId] + size
size_t size = (seqLength % stride) ? seqLength / stride : seqLength / stride + 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已简化

* If input.sequenceStartPositions = [0, 9, 14, 17, 30] and stride = 5,
* then sequenceStartPositions = [0, 2, 3, 4, 7],
* and stridePostions = [0, 5, 9, 14, 17, 22, 27, 30]
*/
Copy link
Contributor

@lcy-seso lcy-seso Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • stridePostions 的计算少了一个 “输入序列是逆序”这个分枝。
  • stridePostions 在序列正序和逆序时的计算结果不一样

上面的例子中:

stridePostions =  [0, 5, 9, 14, 17, 22, 27, 30]  # 正序
stridePostions =  [0, 4, 9, 14, 17, 20, 25, 30]  # 逆序

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 31, 2017

@lcy-seso 已根据要求进行修改

Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost LGTM. only need some small changes.

*/
CHECK(input.sequenceStartPositions);
CHECK_EQ(input.hasSubseq(), 0UL);
CHECK_GT(stride, 0) << "stride must larger than 0";
Copy link
Contributor

@lcy-seso lcy-seso Apr 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 调用 poolSequenceWithStride 这个函数时,外围的逻辑分枝已经是 stride > 0,没有subSeq,所以不再需要 573 ~ 574 这两行的检查。
  • sequenceStartPositions 在调用这个函数之前已经使用(访问)过,572 行已经不再需要检查。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

虽然这次外围的逻辑分支都已经检查过了,但以后调用时可能会出现外围没有做check的情况,因此想保留这些check。

} else {
int size = ceil((float)seqLength / stride);
tgtBuf[seqId + 1] = tgtBuf[seqId] + size;
for (int i = 0; i < size - 1; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 循环 中的循环变量i 用 ++i

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

After pooling with stride n (n is smaller than sequence length),
a long sequence will be shorten.
This function is not suitable for sequence with sub-sequence now.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not suitable for sequence with sub-sequence now. --> This function is invalid for sequence having sub-sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get Last Timestamp Activation of a sequence.

If stride > 0, get last timestamp upon a stride window of sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释可以修改一下,第一句话变成下面这样。

If stride > 0, this layer slides a window whose size is determined by stride, and return the first value of the window as the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get Last Timestamp Activation of a sequence.

If stride > 0, get last timestamp upon a stride window of sequence.
And a long sequence will be shorten. Note that for sequence with
sub-sequence, stride is default -1 now.
Copy link
Contributor

@lcy-seso lcy-seso Apr 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stride is default -1 now. --> the default value of stride is -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get First Timestamp Activation of a sequence.

If stride > 0, get first timestamp upon a stride window of sequence,
and a long sequence will be shorten. Note that for sequence with
sub-sequence, stride is default -1 now.
Copy link
Contributor

@lcy-seso lcy-seso Apr 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stride is default -1 now. --> the default value of stride is -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int stride_;
// store the start position of each stride window
IVectorPtr stridePositions_;
// Whether it is reversed sequence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether the input sequence is reversed or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -42,6 +46,11 @@ class SequencePoolLayer : public Layer {
enum SequenceLevel { kNonSeq = 0, kSeq = 1 };
size_t newBatchSize_;
ICpuGpuVectorPtr startPositions_;
int stride_;
// store the start position of each stride window
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the start position of each window

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -26,6 +26,10 @@ namespace paddle {
* Output: output size is the number of input sequences (NOT input instances)
* output[i] = seqlastin/average/max_{for each instance in this
* sequence}{input[i]}
* If stride_ > 0:
* Check input sequence must don't have sub-sequence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check input sequence must not have sub-sequence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -25,6 +25,9 @@ namespace paddle {
* Input: a sequence
* If SequenceLevel = kNonseq:
* Output: a sequence containing only the last instance of the input sequence
* If stride_ > 0:
* Output: a shorten sequence containing several last instances of the
* input sequence with stride window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output: a shorten sequence. The operation of getting last instance of a sequence is independently performed on every slice of the input sequence, that is obtained by sliding a window with the window size set to stride_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已按要求修改

@@ -25,6 +25,9 @@ namespace paddle {
* Input: a sequence
* If SequenceLevel = kNonseq:
* Output: a sequence containing only the last instance of the input sequence
* If stride_ > 0:
* Output: a shorten sequence containing several last instances of the
* input sequence with stride window.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -26,6 +26,10 @@ namespace paddle {
* Output: output size is the number of input sequences (NOT input instances)
* output[i] = seqlastin/average/max_{for each instance in this
* sequence}{input[i]}
* If stride_ > 0:
* Check input sequence must don't have sub-sequence
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -42,6 +46,11 @@ class SequencePoolLayer : public Layer {
enum SequenceLevel { kNonSeq = 0, kSeq = 1 };
size_t newBatchSize_;
ICpuGpuVectorPtr startPositions_;
int stride_;
// store the start position of each stride window
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int stride_;
// store the start position of each stride window
IVectorPtr stridePositions_;
// Whether it is reversed sequence
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
CHECK(input.sequenceStartPositions);
CHECK_EQ(input.hasSubseq(), 0UL);
CHECK_GT(stride, 0) << "stride must larger than 0";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

虽然这次外围的逻辑分支都已经检查过了,但以后调用时可能会出现外围没有做check的情况,因此想保留这些check。

layer_attr=None):
"""
Get Last Timestamp Activation of a sequence.

If stride > 0, get last timestamp upon a stride window of sequence.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get Last Timestamp Activation of a sequence.

If stride > 0, get last timestamp upon a stride window of sequence.
And a long sequence will be shorten. Note that for sequence with
sub-sequence, stride is default -1 now.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1357,6 +1362,8 @@ def last_seq(input,
:type name: basestring
:param input: Input layer name.
:type input: LayerOutput
:param stride: parameter of stride window.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get First Timestamp Activation of a sequence.

If stride > 0, get first timestamp upon a stride window of sequence,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

layer_attr=None):
"""
Get First Timestamp Activation of a sequence.

If stride > 0, get first timestamp upon a stride window of sequence,
and a long sequence will be shorten. Note that for sequence with
sub-sequence, stride is default -1 now.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -37,6 +42,7 @@ class SequenceLastInstanceLayer : public SequencePoolLayer {
protected:
MatrixPtr tmpSrc_;
MatrixPtr tmpDest_;
std::vector<int> insId_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insId_ ==> instanceIds_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -42,6 +46,11 @@ class SequencePoolLayer : public Layer {
enum SequenceLevel { kNonSeq = 0, kSeq = 1 };
size_t newBatchSize_;
ICpuGpuVectorPtr startPositions_;
int stride_;
// store the start position of each window
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store ... window ==> Store ... window. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int stride_;
// store the start position of each window
IVectorPtr stridePositions_;
// Whether the input sequence is reversed or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not ==> not .?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void testDegradeLayer(bool hasSubseq,
string layer_type,
string trans_type,
int stride = -1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default argument value 是可以用的,但是一般不用,原因在这里: https://google.github.io/styleguide/cppguide.html#Default_Arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

size_t stride,
IVectorPtr* stridePostions,
bool reversed) {
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在.cpp文件里,用C++ style comment // 更能统一风格。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lcy-seso lcy-seso merged commit df5a95d into PaddlePaddle:develop Apr 13, 2017
@luotao1 luotao1 deleted the stride branch April 13, 2017 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants