-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send recv op #5520
Send recv op #5520
Conversation
why choose grpc but not baidu-rpc |
for (size_t len : lod_length[i]) { | ||
level.push_back(level.back() + len); | ||
} | ||
} | ||
} | ||
|
||
void SerializeToStream(std::ostream &os, const LoDTensor &tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang I think the serialize function can be put into the tensor_util.h
. Which is a part of this PR. #5455
Did you change the viewpoint after reading the boost serialized reference manul?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzhwinter Thanks for asking! It's ok if all the tensor types that we currently and in the future support will only be the LoDTensor
(so we don't need polymorphism). Otherwise I think having serialization as an interface that every tensor type implements would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting non-member functions in separated source files seems good. Also, we may need to serialize SelectedRows
in the future, so putting serialize functions in util is file, at this point I agree with @helinwang to have "serialization as an interface that every tensor type implements". Would you mind please update PR #5455 to add serialization functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang
We need to support multiple types, LoDTensor
, SelectedRows
, LoDTensorArray
, we can put it as a template free-function(global function).
And I think:
-
polymorphism
is not a real demand in application. When you do a serialization, you will have an object with a proper type. You can use a reference instead of a pointer. The class always provide sufficient information to serialize itself.
http://www.boost.org/doc/libs/1_45_0/libs/serialization/doc/serialization.html -
Other popular libraries always serialized object through free function.
Such as protobuf, msgpack.
MsgPack::Serializer serializer(socket);
std::vector<std::unique_ptr<MsgPack::Element>> arrayWithoutElements, arrayWith3Elements;
arrayWith3Elements.push_back(MsgPack::Factory(true));
arrayWith3Elements.push_back(MsgPack__Factory(Array(std::move(arrayWithoutElements))));
arrayWith3Elements.push_back(MsgPack::Factory("Hello World!"));
serializer << MsgPack__Factory(Array(std::move(arrayWith3Elements)));
MsgPack::Deserializer deserializer(socket);
deserializer.deserialize([](std::unique_ptr<MsgPack::Element> parsed) {
std::cout << "Parsed: " << *parsed << "\n";
return false;
}, true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we treat Tensor and the operators as a computing library, then we should use the free function. --- Just treat Tensor as a third-party library.
@typhoonzero @helinwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- treat Tensor and the operators as a computing library, then we'll have to describe the data inside each type to be serialized. We are using
xxxDesc
for now, eg.message TensorDesc { required DataType data_type = 1; repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] }
- use the desc and the data pointers is enouph for a "Free Function" to serialize it.
paddle/operators/recv_op.cc
Outdated
ServerBuilder builder; | ||
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials()); | ||
builder.RegisterService(service.get()); | ||
// rpc_server.reset(new Server(builder.BuildAndStart()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comment lines please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@jacquesqiao I made the RPC implementation in separate place, so it'll easy to switch. I'll add a benchmark for training job metric like throughput (model size), since brpc's benchmark is mainly about small queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Did you get a chance to profile the serialization speed of protobuf string
? (used in tensor serialization).
* You can pass ofstream or ostringstream to serilize to file | ||
* or to a in memory string. GPU tensor will be copied to CPU. | ||
*/ | ||
void SerializeToStream(std::ostream& os, const LoDTensor& tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For discussion: consider make SerializeToStream
and DeserializeFromStream
member functions for LoDTensor
. Reason stated here: https://github.com/PaddlePaddle/Paddle/pull/5520/files#r151577859
CC: @dzhwinter
paddle/operators/detail/recv_impl.cc
Outdated
namespace operators { | ||
namespace detail { | ||
|
||
Status SendRecvServerImpl::InitVariables( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for initializing variables? I think send/recv should not have anything to do with initing the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will remove.
paddle/operators/detail/send_impl.cc
Outdated
namespace operators { | ||
namespace detail { | ||
|
||
bool RPCClient::InitVariables(const framework::Scope& scope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for initializing variables? I think send/recv should not have anything to do with initing the variable.
paddle/operators/detail/send_impl.cc
Outdated
namespace operators { | ||
namespace detail { | ||
|
||
bool RPCClient::InitVariables(const framework::Scope& scope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of RPCClient
and SendRecvServerImpl
does not match as a pair, and maybe the name should state that they are related to send/recv, so maybe something like sendImpl
/ recvImpl
?
// TODO(typhoonzero): desirealize in_tensor and run pserver network. | ||
std::istringstream iss(in_var->serialized()); | ||
framework::DeserializeFromStream(iss, &t); | ||
lodtensor_queue_.Push(std::move(t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding each send/recv is for connecting one edge in the graph between nodes, why a queue (for multiple values) is necessary?
namespace paddle { | ||
namespace operators { | ||
|
||
void RunServer(Server **rpc_server, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does currently we start one server per recv OP? There could be easily tens or even hundreds of send/recv OP pairs, maybe we should only start only one server. E.g., the send/recv queue is an argument for the constructor of send/recv OP, and there will be only one single instance of send recv server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be done in next PR. From my current consideration, we'll need a RemoteOptimizer
interface which will do this work, it will create server-side program
and put it in recv op as a member block to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that this PR is too big and could be slow to merge. It looks to me that many pieces, e.g., the introduction of grpc, could be a separate PR.
.clang-format
Outdated
@@ -24,5 +24,8 @@ Standard: Cpp11 | |||
AllowAllParametersOfDeclarationOnNextLine: true | |||
BinPackParameters: false | |||
BinPackArguments: false | |||
--- | |||
Language: Proto | |||
# Don't format .proto files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not format .proto file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these lines. I'm not sure why my previous style check won't pass.
@@ -133,6 +133,8 @@ include(external/any) # download libn::any | |||
include(external/eigen) # download eigen3 | |||
include(external/pybind11) # download pybind11 | |||
include(external/nccl) | |||
include(external/cares) | |||
include(external/grpc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember that you said you want to try brpc? @typhoonzero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will add benchmark in next PR to decide which to use. That won't affect current code structure.
cmake/generic.cmake
Outdated
@@ -467,3 +467,43 @@ function(py_test TARGET_NAME) | |||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) | |||
endif() | |||
endfunction() | |||
|
|||
|
|||
function(grpc_library TARGET_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a comment for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are cares
and grpc
needed in all kinds of builds of Paddle? I just suggest to add an if
statement here to return early if they are not needed, or at least add the following statements to avoid the failing of building for mobile:
IF(MOBILE_INFERENCE)
return()
ENDIF()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will add. Thanks for reminding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prototype has few difference with our design doc. @helinwang https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactor/parameter_server.md
- This PR split the role of trainer and pserver, which should be isomorphic.
- The SimpleBlockingQueue needs to support the asynchronized
Send
operatoration, just like therendezvous
in tensorflow. - The RPC service only has one SendVariable interface, can it satisfy the unblocking function call such as
Send
? EmptyVariable
in grpc seems ugly, do we have better method improve it?
@helinwang
Since this prototype is separated with our main develop branch, I think we can merge this PR and enhanced the implementation later. BTW, the conflicts. @typhoonzero |
No. This is just basic implementation of send/recv op. We must have higher level wrappers for how to use it, like in the unit test's code. I'm afraid that the high-level API must split roles of trainer and pserver for simpler API. |
ok. I see. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Implement by this design
Still work in progress, enable distributed training using gRPC.
TODO in later PRs: