-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ParameterServerController for parameter server python api #1051
Add ParameterServerController for parameter server python api #1051
Conversation
@@ -0,0 +1,70 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我现在对util这样的命名特别紧张——我发现一般都是大家懒得想明白应该叫什么的时候就叫util了。
在这里,看上去是想叫 PServerController 什么的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool,同意。
PServerController是个好名字。
感觉程序员起名字是一个非常痛苦的事情。一痛苦就会用一些比较常用的名字,比如Utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* @brief start all pserver thread in this PServerUtil. | ||
*/ | ||
void start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们的naming convention指定的怎么样了? @reyoung
在这里constructor是camel形式,但是methods都是小写。显然不一致呀。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里因为构造函数和类名必须一致。类名必须是UpperCamelCase。比如 "SomeClass".而函数名是"lowerCamelCase",比如"someMethod"。
这样做的好处是,我们可以通过判断出一个东西是不是类型了。
比如
class SomeClass {
public:
class Helper {
};
static Helper helper();
};
SomeClass::Helper
是类型,而SomeClass::helper
是函数。
@@ -0,0 +1,43 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面有文件叫 PServerUtils.*,这里叫ParameterServer,显然不一致呀。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个配置文件确实是用来配置parameter server的,目前的pserverutil封装了几个parameter server线程,根据config来创建这些线程。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的意思是到底应该叫 pserver 还是 parameter server 呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照命名规范修改为ParameterServerController
@@ -0,0 +1,70 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool,同意。
PServerController是个好名字。
感觉程序员起名字是一个非常痛苦的事情。一痛苦就会用一些比较常用的名字,比如Utils
/** | ||
* @brief start all pserver thread in this PServerUtil. | ||
*/ | ||
void start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里因为构造函数和类名必须一致。类名必须是UpperCamelCase。比如 "SomeClass".而函数名是"lowerCamelCase",比如"someMethod"。
这样做的好处是,我们可以通过判断出一个东西是不是类型了。
比如
class SomeClass {
public:
class Helper {
};
static Helper helper();
};
SomeClass::Helper
是类型,而SomeClass::helper
是函数。
void join(); | ||
|
||
private: | ||
std::vector<std::shared_ptr<ParameterServer2>> pservers_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<std::shared_ptr<ParameterServer2>>
=>std::vector<std::unique_ptr<ParameterServer2>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
namespace paddle { | ||
|
||
class PServerUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class PServerUtil
=> class PServerUtil final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
PServerController* PServerController::createByGflags() { | ||
auto& pServerConfig = *paddle::PServerController::initConfigByGflags(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有内存泄露
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接用栈变量不可以么?也就是
ParameterServerConfig config;
config.set_nics(FLAGS_nics);
...
这样。同时,initConfigByGflags 只被 createByGFlags 调用,没必要extract成一个private的static member function了吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
PServerController::~PServerController() { this->join(); } | ||
|
||
ParameterServerConfig* PServerController::initConfigByGflags() { | ||
ParameterServerConfig* config = new ParameterServerConfig(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
尽量不要用C++的new关键词。
最省事的方法是:
auto config = std::make_shared<ParameterServerConfig>();
或者是
auto config = std::make_unique<ParameterServerConfig>(); // since c++ 14
虽然目前没有make_unique,不过回头我加上吧。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对了,勉强在Cpp里面和java new
语意一致的东西是 std::make_shared<类型名>(参数)
。只是std::make_unique
会快一点。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞,多谢,已经修改了,不过make_unique打算如何引入?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接加上这个函数也可以。。http://stackoverflow.com/questions/17902405/how-to-implement-make-unique-function-in-c11
判断一下C++版本, if __cplusplus != 14,那么就加上make_unique。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Cool.
… add-pserver-util
… add-pserver-util
|
||
namespace paddle { | ||
|
||
class ParameterServerController final { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增的class应该有class comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
required int32 trainer_id = 1; | ||
} | ||
|
||
message ParameterServerConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该有个注释说明这个proto message的用意。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
package paddle; | ||
|
||
message ParameterClientConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该有个注释说明这个proto message的用意。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,写的比较简单
} | ||
|
||
void ParameterServerController::start() { | ||
LOG(INFO) << "pserver sizes : " << pservers_.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pserver sizes" ==> "number of pserver instances"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
LOG(INFO) << "pserver sizes : " << pservers_.size(); | ||
int i = 0; | ||
for (const auto& pserver : pservers_) { | ||
LOG(INFO) << "pserver started : " << i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG(INFO) << "Staring pserver " << i;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
ParameterServerController::~ParameterServerController() { this->join(); } | ||
|
||
ParameterServerController* ParameterServerController::createByGflags() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createFromGflags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,这个地方之前确实纠结过
} | ||
} | ||
|
||
void ParameterServerController::join() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,好主意
trainer和pserver除了日常的通信,还有一个同步控制(核心目的是为了实现async-sgd、sync-sgd、pass起始、batch 起始等核心逻辑)的逻辑,从这个ParameterServerController实现来看,主要是起到对启动pserver进程的控制,没有同步的控制逻辑。 问题:
|
现在版本中有一个耦合在trainer中的类似『ParameterServerController』的功能, 参见: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/trainer/RemoteParameterUpdater.cpp#L623 它主要强调对SGD算法(sync-sgd, async-sgd)层面的逻辑控制。 |
对的,目前ParameterServerController主要是对parameter server那些thread的封装,干了一些初始化和简单的控制功能,而且只能控制一个ps进程。
后面可以考虑,不过trainer的这个controller貌似是训练这边的一个逻辑,直接和ParameterServer2通信,后面的具体实现,可以再考虑。 |
* test=develop,adding a description of the CMAKE command
…#1051) * fix cmake by use abs path * add boost dependency Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>
1,code clean, ParameterServer2Main.cpp and TrainerMain.cpp have duplicated code.
2, the duplicated code use many gflags to control the init of parameter server, so add a wrapper for pserver, change gflags to proto config.
refs #1039
More about this PR (copied from IM chat with @jacquesqiao):