Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-define the type system #8352

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 48 additions & 46 deletions paddle/fluid/framework/framework.proto
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

syntax = "proto2";
option optimize_for = LITE_RUNTIME;
Expand Down Expand Up @@ -91,52 +91,54 @@ message OpProto {
required string comment = 5;
}

enum DataType {
BOOL = 0;
INT16 = 1;
INT32 = 2;
INT64 = 3;
FP16 = 4;
FP32 = 5;
FP64 = 6;
}
message VarType {
enum Type {
// POD types
BOOL = 0;
INT16 = 1;
INT32 = 2;
INT64 = 3;
FP16 = 4;
FP32 = 5;
FP64 = 6;

// other types, that might need additonal descriptions
LOD_TENSOR = 7; // LoDTensor
SELECTED_ROWS = 8; // Tensor
FEED_MINIBATCH = 9;
FETCH_LIST = 10;
STEP_SCOPES = 11;
LOD_RANK_TABLE = 12;
// LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType
Copy link
Collaborator

@reyoung reyoung Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tensors in the LOD_TENSOR_ARRAY will hold the same type and the same shape in compile time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. So LoDTensorArray is not Arary. We got that. Thanks for the reminder!

// PLACE_LIST = 8; // replaced by array-of-place. c.f. ArrayType
READER = 13; // ReaderType
}

message TensorDesc {
required DataType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
}
required Type type = 1;

message LoDTensorDesc {
required TensorDesc tensor = 1;
optional int32 lod_level = 2 [ default = 0 ];
}
message Tensor {
required Type data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
}
optional Tensor selected_rows = 2;

message LoDTensorArrayDesc {
required TensorDesc tensor = 1;
optional int32 lod_level = 2 [ default = 0 ];
}
message LoDTensor {
required Tensor tensor = 1;
optional int32 lod_level = 2 [ default = 0 ];
}
optional LoDTensor lod_tensor = 3;

message ReaderDesc { repeated LoDTensorDesc lod_tensor = 1; }
message Array { VarType elem_type = 1; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the VarType be repeated?
And message Array can be used as array of VarType in which the VarType should be the same and n-tuple in which the VarType of the array can be different, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the VarType in the array should be repeated. This is because the array will have all elements of the same VarType. In an n-tuple, the VarType will be repeated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Abhinav's comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavarora Thanks, you are right, I finally know where I was wrong.
We should add message Tuple {repeated VarType tuple_type = 1;}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this has also been discussed here: #8336 Instead of VarDesc we will have VarType as has been done in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, channel of channels is a commonly used pattern used with CSP

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a channel can hold so many types, we cannot make shape inference easily at compile time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Tuple and Array should be two different types.

  • The Array can hold elements with the same type and shape at compile type. The shape of array_read can be inferenced at compile time only if the array holds the same time.

    • The length of Array can only be decided at runtime since there are array_write operators for RNN. If we make array can hold different types, we cannot inference type at compile time.
  • The Tuple is an immutable type. The length of Tuple and its elements will not be changed at compile time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyoung You are right, Tuple and Array are 2 different types. The type of the array elements will be known at compile time. Tuple should be immutable and its length and elements will not change. I am just a little curious how we could implement thin our cpp code. Today, we map the LODTensorArray directly to a Vector<LODTensor>. For an Array would we have to create mappings for all types?

Copy link
Contributor

@chengduoZH chengduoZH Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we have to create mappings for all types?

I am curious about this too.

optional Array array = 4;

message Reader { repeated LoDTensor lod_tensor = 1; }
optional Reader reader = 5;
}

message VarDesc {
enum VarType {
LOD_TENSOR = 1;
SELECTED_ROWS = 2;
FEED_MINIBATCH = 3;
FETCH_LIST = 4;
STEP_SCOPES = 5;
LOD_RANK_TABLE = 6;
LOD_TENSOR_ARRAY = 7;
PLACE_LIST = 8;
READER = 9;
}
required string name = 1;
required VarType type = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can VarDesc::type be POD type like BOOL, INT16? Or it can only be LOD_TENSOR, SELECTED_ROWS...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VarType::Type contains POD type.

optional bool persistable = 3 [ default = false ];
optional LoDTensorDesc lod_tensor = 4;
optional TensorDesc selected_rows = 5;
optional LoDTensorArrayDesc tensor_array = 6;
optional ReaderDesc reader = 7;
}

message BlockDesc {
Expand Down