Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should output of sequence_pool be LoDTensor with LoD? #7212

Closed
pkuyym opened this issue Jan 4, 2018 · 13 comments
Closed

Should output of sequence_pool be LoDTensor with LoD? #7212

pkuyym opened this issue Jan 4, 2018 · 13 comments
Assignees
Labels

Comments

@pkuyym
Copy link
Contributor

pkuyym commented Jan 4, 2018

Currently, output of sequence_pool is a LoDTensor without LoD. However, the input is LoDTensor with LoD. Should we add LoD to output of sequence_pool.

@luotao1
Copy link
Contributor

luotao1 commented Jan 4, 2018

I think the output of sequence_pool (where input's LoDLevel=1) is Tensor, since the output is not a sequence .

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 4, 2018

Since the output is a LoDTensor object, I think adding LoD info has no side effects. In some situation like decoder_boot in seq2seq, adding LoD info makes things easier.

@lcy-seso
Copy link
Contributor

lcy-seso commented Jan 5, 2018

I agree conceptually, a Tensor (here specifically means a non-sequence) equals to a LoDTensor (here specifically means a sequence) with the LoDTensor fixed to be 1 for each sequence in the batch. This is no obvious side effect. But my concern is this is not a default setting for Fluid operators right now. Will this potentially affect the logic of other operators when they work together? I am not very sure about why not to enhance the ReorderLoDTensorByRankTableOp Is there any problem?

sequece_pool degrade a sequence into a non-sequence is frequently used in NLP task in previous PaddlePaddle. This is because, in many NLP task, there always requires encoding the variable-length sequences with fixed length vectors. Conceptually, in this situation, after pooling (the stride is length of the sequence) the sequence becomes a non-sequence.

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 5, 2018

In my opinion, ReorderLoDTensorByRankTableOp requires sequence input, so the input LoDTensor should contains LoD. If make the operator accepts common Tensor without LoD, it may confuse users. Or we can add another operator for similar function for common Tensor.

size_t size = x.lod()[level].size();
for (size_t i = 0; i < size - 1; ++i) {
auto lod_offset =
framework::GetSubLoDAndAbsoluteOffset(x.lod(), i, i + 1, level);
auto &offset = lod_offset.second;

Of course, we can adapt current ReorderLoDTensorByRankTableOp to be compatible for common Tenosr, do you think it's a good idea? @reyoung

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 5, 2018

@luotao1 @lcy-seso For sequence_pool, I think we can expose an attribute to users to decide whether inherit LoD info for output.

@guoshengCS
Copy link
Contributor

guoshengCS commented Jan 5, 2018

I edit ReorderLoDTensorByRankTableOp in this way:

  std::vector<AbsoluteRankTableItem> GetAbsoluteOffsetAndLengthByLoDRankTable(
      const framework::LoDTensor &x) const {
    std::vector<AbsoluteRankTableItem> absolute_table;

    if (x.lod().empty()) {
      // For Tensor without lod, such as the output of sequence_pool_op
      size_t size = x.dims()[0];
      absolute_table.reserve(size);
      for (size_t i = 0; i < size; ++i) {
        absolute_table.emplace_back();
        absolute_table.back().length = 1;
        absolute_table.back().offset = i;
      }
    } else {
      size_t level = 0;
      size_t size = x.lod()[level].size();

      for (size_t i = 0; i < size - 1; ++i) {
        auto lod_offset =
            framework::GetSubLoDAndAbsoluteOffset(x.lod(), i, i + 1, level);

        auto &offset = lod_offset.second;

        absolute_table.emplace_back();
        absolute_table.back().length = offset.second - offset.first;
        absolute_table.back().offset = offset.first;
        absolute_table.back().lod = lod_offset.first;
      }
    }

    return absolute_table;
  }

All the changes are nearly completed, and I will push it to accept reviewing.

@luotao1
Copy link
Contributor

luotao1 commented Jan 5, 2018

I think we can expose an attribute to users to decide whether inherit LoD info for output

@pkuyym This will lead users more confused, and I agree with @guoshengCS to enhance the ReorderLoDTensorByRankTableOp, but not to modify the sequence_pool.

@reyoung
Copy link
Collaborator

reyoung commented Jan 5, 2018

I also agree with @luotao1 and @guoshengCS . ReorderLoDTensorByRankTable should be enhanced.

@reyoung
Copy link
Collaborator

reyoung commented Jan 5, 2018

BTW. There is a related question,

Is A LoDTensor without LoD information as same as A LoDTensor with LoD information, the length of sequences are all one.

For example, is LoDTensor

[[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]

as same as

[[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]

LoD=[[0, 1, 2, 3]]

?

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 5, 2018

@guoshengCS Please add corresponding test cases.

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 5, 2018

@reyoung If we only keep LoDTensor, I think LoD being empty equals all length of sequences being ones makes sense.

@lcy-seso
Copy link
Contributor

lcy-seso commented Jan 5, 2018

I do not prefer to modify the sequence_pool operator to let the user decide whether to inherit the LoD information because:

  1. If we let the user determine whether to carry the LoD information, it seems this is too trivial. We can decide the rule for users.
  2. Does this means the user has to understand the "reordering" process in while_op? and under what conditions he has to inherit the LoD information under what conditions it has not to? It seems this is our external implementation, it is better if we decide a rule for users.

Enhance ReorderLoDTensorByRankTableOp is reasonable. Personally, I think this is not confusing. If we compared it to old implementation, this operator can also sort the Level 2 LoD tensor.

A tensor without LoD information is equal to a batch of the sequences that length of each sequence in the batch is fixed to be 1.

@pkuyym
Copy link
Contributor Author

pkuyym commented Jan 15, 2018

Already fixed by #7251

@pkuyym pkuyym closed this as completed Jan 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants