-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge with lasted develop branch. Optimizer lib2 #2386
Changes from 9 commits
62cd5c7
3158efe
5b8a0c5
8610ba1
b4aa0ec
26e9c4e
5f9cd8c
b9d024e
5ab958b
fd8c510
3b1294a
81cad37
beb2697
5a1e678
bc26df7
b9cb0f2
6cbbc2e
f5ff283
e456796
33b4dee
0fc4201
b7e68e0
b72e8aa
1814fc2
e148bc1
a46f3fc
df5bc78
65d9e33
ec65fa8
baef96e
99849cf
72b6b26
03884f0
a166e52
33ddc89
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
include_directories(${CMAKE_CURRENT_BINARY_DIR}) | ||
|
||
set(OPITMIZER_SRCS | ||
adadelta_optimizer.cc | ||
adagrad_optimizer.cc | ||
adam_optimizer.cc | ||
optimizer.cc | ||
parameter_optimizer.cc | ||
sgd_optmizer.cc | ||
) | ||
|
||
set(OPITMIZER_Headers | ||
adadelta_optimizer.h | ||
adagrad_optimizer.h | ||
adam_optimizer.h | ||
lr_policy.h | ||
optimizer.h | ||
parameter_optimizer.h | ||
sgd_optimizer.h | ||
Tensor.h | ||
) | ||
|
||
add_library(optimizer STATIC ${OPITMIZER_SRCS}) | ||
add_dependencies(optimizer gen_proto_cpp) | ||
|
||
add_simple_unittest(Tensor_test) | ||
add_simple_unittest(parameter_optimizer_test) | ||
add_dependencies(parameter_optimizer_test optimizer) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#ifndef PADDLE_OPTIMIZER_TENSOR_H_ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PaddlePaddle use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we will change that and follow the google c++ style. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In a formal discussion, we decided to use |
||
#define PADDLE_OPTIMIZER_TENSOR_H_ | ||
/** | ||
* @brief tensor used by optimizer | ||
*/ | ||
|
||
#include <string.h> | ||
#include "paddle/utils/Common.h" | ||
#include "paddle/utils/Logging.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
template <class T> | ||
class TensorT { | ||
public: | ||
TensorT(size_t h, size_t w, T* data) : height_(h), width_(w), data_(data_) {} | ||
TensorT(T* data, int size) : height_(1), width_(size), data_(data) {} | ||
TensorT(const TensorT& t) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you explain to me what does ":" do here? Sorry I am not too familiar, and don't know what's the keyword to search for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is initializer in c++, which is the idiomatic way in c++ initializes parameter. please check here for detail. http://en.cppreference.com/w/cpp/language/direct_initialization There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 括号后面的":",这个是c++里的初始化手段,和构造函数还不是同一个概念。初始化列表和构造函数的关系类比python的__new__和__init__的关系。初始化列表会在构造函数前完成(就是花括号里的东西)。 一般推荐非静态成员都使用该方法初始化 |
||
: TensorT(1, t.size(), 0, t.get_buffer(), false, false) {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess here is copy constructor, and what here is doing is that it created a new tensor, copying from the old tensor. And they shared the same buffer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 fix done. |
||
TensorT& operator=(const TensorT& t) { | ||
this->width_ = t.size(); | ||
this->data_ = t.get_buffer(); | ||
} | ||
T* get_buffer() { return this->data_; } | ||
T& operator[](const size_t idx) { | ||
CHECK(idx >= 0 && idx < this->width_) << "out of index range"; | ||
return data_[idx]; | ||
} | ||
T& operator[](const size_t idx) const { | ||
CHECK(idx >= 0 && idx < this->width_) << "out of index range"; | ||
return data_[idx]; | ||
} | ||
// TODO: replace with tensorshape | ||
size_t size() const { return this->width_; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since height is already a member variable, I think you implemented a 2-d tensor, so here should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it should be. fix done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it should be. fixed done. |
||
|
||
protected: | ||
size_t height_; | ||
size_t width_; | ||
T* data_; | ||
}; | ||
|
||
// TODO(zhihong): design problem of dynamic datatype, need to fix it | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that when porting "majel" to PaddlePaddle, we already included boost/variant.hpp for the "single value multiple type" container. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, 👍. Either we can wait for their majel port job finish, or implement another one with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
typedef TensorT<real> Tensor; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not know whether There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. real type is a macro type in the whole project. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/Common.h#L30 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let me explain this part more clear. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, that's fine. Can you use After this PR we need to work on another PR to add tensors of required data types. MATCH_ENUM_TYPE(int32_t, PADDLE_ELEMENT_TYPE_INT32);
MATCH_ENUM_TYPE(uint32_t, PADDLE_ELEMENT_TYPE_UINT32);
MATCH_ENUM_TYPE(int64_t, PADDLE_ELEMENT_TYPE_INT64);
MATCH_ENUM_TYPE(uint64_t, PADDLE_ELEMENT_TYPE_UINT64);
// only below is implemented, we need to implement other types in a follow up PR.
MATCH_ENUM_TYPE(float, PADDLE_ELEMENT_TYPE_FLOAT32);
MATCH_ENUM_TYPE(double, PADDLE_ELEMENT_TYPE_FLOAT64); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, fix done. |
||
|
||
} // namespace optimizer | ||
} // namespace paddle | ||
|
||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#include "Tensor.h" | ||
#include <iostream> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe we should add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
#include "gtest/gtest.h" | ||
|
||
using namespace paddle; | ||
using namespace paddle::optimizer; | ||
|
||
TEST(Tensor, indexer) { | ||
real* ptr = new real[3]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to release the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed done. |
||
Tensor t(ptr, 3); | ||
for (auto i = 0; i < t.size(); ++i) { | ||
t[i] = i; | ||
} | ||
ASSERT_EQ(t[2], 2); | ||
ASSERT_EQ(t[1], 1); | ||
} | ||
|
||
int main(int argc, char** argv) { | ||
testing::InitGoogleTest(&argc, argv); | ||
return RUN_ALL_TESTS(); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#include "adadelta_optimizer.h" | ||
#include <algorithm> | ||
#include <cmath> | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
void AdadeltaOptimizer::set_weight(Tensor* p) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why the content of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need Google style function names. Please replace all C++ function names with CamelCase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://google.github.io/styleguide/cppguide.html#Function_Names
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, sorry! |
||
size_t size = p->size(); | ||
real* gptr = new real[size]; | ||
accum_gradient = new Tensor(gptr, size); | ||
real* dptr = new real[size]; | ||
accum_delta = new Tensor(dptr, size); | ||
real* dptr_current = new real[size]; | ||
update_delta = new Tensor(dptr_current, size); | ||
} | ||
|
||
void AdadeltaOptimizer::update(const Tensor* gradient) { | ||
num_sample_passed += 1; | ||
double learning_rate = lr_policy->get_learning_rate(num_sample_passed); | ||
Tensor& param = *parameter_; | ||
const Tensor& grad = *gradient; | ||
Tensor& accum_g = *accum_gradient; | ||
Tensor& accum_d = *accum_delta; | ||
Tensor& update_d = *update_delta; | ||
for (size_t i = 0; i < param.size(); ++i) { | ||
accum_g[i] = rho * accum_g[i] + (1.0 - rho) * grad[i] * grad[i]; | ||
|
||
update_d[i] = std::sqrt(accum_d[i] + epsilon) / | ||
std::sqrt(accum_g[i] + epsilon) * grad[i]; | ||
|
||
accum_d[i] = rho * accum_d[i] + (1.0 - rho) * update_d[i] * update_d[i]; | ||
|
||
param[i] -= learning_rate * update_d[i] + learning_rate * decay * param[i]; | ||
} | ||
} | ||
} // namespace optimizer | ||
} // namespace paddle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#ifndef PADDLE_ADADELTA_OPTIMIZER_H_ | ||
#define PADDLE_ADADELTA_OPTIMIZER_H_ | ||
|
||
#include "parameter_optimizer.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
class AdadeltaOptimizer : public ParameterOptimizer { | ||
public: | ||
using ParameterOptimizer::parameter_; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These members will be accessed by the derived class, private keyword will forbidden that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not use public variable if it's not intended for public. It breaks encapsulation. Protected is fine for me. I understand being consistent is important, but we should not be consistent with poor design, otherwise how can we improve code quality? |
||
using ParameterOptimizer::num_sample_passed; | ||
using ParameterOptimizer::lr_policy; | ||
|
||
AdadeltaOptimizer(double rho, double epsilon, double decay, BaseLr *lr) | ||
: ParameterOptimizer(lr), rho(rho), epsilon(epsilon), decay(decay) {} | ||
~AdadeltaOptimizer() { | ||
if (accum_gradient) delete accum_gradient; | ||
if (accum_delta) delete accum_delta; | ||
if (update_delta) delete update_delta; | ||
} | ||
void update(const Tensor *gradient); | ||
void set_weight(Tensor *p); | ||
real *get_weight() const; | ||
|
||
private: | ||
Tensor *accum_gradient; | ||
Tensor *accum_delta; | ||
Tensor *update_delta; | ||
|
||
double rho; | ||
double epsilon; | ||
double decay; | ||
}; | ||
|
||
} // namespace optimizer | ||
} // namespace paddle | ||
|
||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
#include <cmath> | ||
|
||
#include "adagrad_optimizer.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
void AdagradOptimizer::set_weight(Tensor* p) { | ||
size_t size = p->size(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems the content of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's a horrible mistake.... |
||
real* gptr = new real[size]; | ||
accum_gradient = new Tensor(gptr, size); | ||
} | ||
|
||
void AdagradOptimizer::update(const Tensor* gradient) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to follow Google C++ code style for C++ function names: https://google.github.io/styleguide/cppguide.html#Function_Names There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix done. |
||
num_sample_passed += 1; | ||
double learning_rate = lr_policy->get_learning_rate(num_sample_passed); | ||
Tensor& param = *parameter_; | ||
const Tensor& grad = *gradient; | ||
Tensor& accum_g = *accum_gradient; | ||
for (size_t i = 0; i < param.size(); ++i) { | ||
accum_g[i] += grad[i] * grad[i]; | ||
param[i] += learning_rate * grad[i] / std::sqrt(accum_g[i] + epsilon) + | ||
learning_rate * decay * param[i]; | ||
} | ||
} | ||
|
||
} // namespace optimizer | ||
} // namespace paddle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#ifndef PADDLE_ADAGRAD_OPTIMIZER_H_ | ||
#define PADDLE_ADAGRAD_OPTIMIZER_H_ | ||
|
||
#include "parameter_optimizer.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
class AdagradOptimizer : public ParameterOptimizer { | ||
public: | ||
AdagradOptimizer(double epsilon, double decay, BaseLr *lr) | ||
: ParameterOptimizer(lr), epsilon(epsilon), decay(decay) {} | ||
~AdagradOptimizer() { | ||
if (accum_gradient) delete accum_gradient; | ||
} | ||
void update(const Tensor *gradient); | ||
void set_weight(Tensor *p); | ||
real *get_weight() const; | ||
|
||
private: | ||
Tensor *accum_gradient; | ||
double epsilon; | ||
double decay; | ||
}; | ||
|
||
} // namespace optimizer | ||
} // namespace paddle | ||
|
||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#include "adam_optimizer.h" | ||
#include <cmath> | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
void AdamOptimizer::set_weight(Tensor *p) { | ||
size_t size = p->size(); | ||
real *mptr = new real[size]; | ||
momentums_ = new Tensor(mptr, size); | ||
real *vptr = new real[size]; | ||
velocitys_ = new Tensor(vptr, size); | ||
} | ||
|
||
void AdamOptimizer::update(const Tensor *gradient) { | ||
num_sample_passed += 1; | ||
double learning_rate = lr_policy->get_learning_rate(num_sample_passed); | ||
double coef1 = 1.0 - std::pow(beta_1, num_sample_passed); | ||
double coef2 = 1.0 - std::pow(beta_2, num_sample_passed); | ||
learning_rate *= std::sqrt(coef2) / coef1; | ||
Tensor ¶m = *parameter_; | ||
const Tensor &grad = *gradient; | ||
Tensor &m = *momentums_; | ||
Tensor &v = *velocitys_; | ||
for (size_t i = 0; i < param.size(); ++i) { | ||
m[i] = beta_1 * m[i] + (1.0 - beta_1) * grad[i]; | ||
v[i] = beta_2 * v[i] + (1.0 - beta_2) * grad[i] * grad[i]; | ||
param[i] -= | ||
learning_rate * (m[i] / std::sqrt(v[i] + epsilon) + decay * param[i]); | ||
} | ||
} | ||
} // namespace optimizer | ||
} // namespace paddle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#ifndef PADDLE_ADAM_OPTIMIZER_H_ | ||
#define PADDLE_ADAM_OPTIMIZER_H_ | ||
|
||
#include "parameter_optimizer.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
class AdamOptimizer : public ParameterOptimizer { | ||
public: | ||
AdamOptimizer( | ||
double beta_1, double beta_2, double epsilon, double decay, BaseLr *lr) | ||
: ParameterOptimizer(lr), | ||
beta_1(beta_1), | ||
beta_2(beta_2), | ||
epsilon(epsilon), | ||
decay(decay) {} | ||
~AdamOptimizer() { | ||
if (momentums_) delete momentums_; | ||
if (velocitys_) delete velocitys_; | ||
} | ||
void update(const Tensor *gradient); | ||
void set_weight(Tensor *p); | ||
real *get_weight() const; | ||
|
||
private: | ||
Tensor *momentums_; | ||
Tensor *velocitys_; | ||
double beta_1; | ||
double beta_2; | ||
double epsilon; | ||
double decay; | ||
}; | ||
|
||
} // namespace optimizer | ||
} // namespace paddle | ||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#ifndef PADDLE_OPTIMIZER_LR_POLICY_H_ | ||
#define PADDLE_OPTIMIZER_LR_POLICY_H_ | ||
|
||
#include <algorithm> | ||
#include "OptimizerConfig.pb.h" | ||
|
||
namespace paddle { | ||
namespace optimizer { | ||
|
||
class BaseLr { | ||
public: | ||
BaseLr(double lr) : learning_rate(lr) {} | ||
virtual ~BaseLr() {} | ||
virtual double get_learning_rate(const uint64_t num_sample_passed) = 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
另外既然这个文件已经叫lr_policy了,这个class是不是直接叫LrPolicy更合适? 最后,Google c++ coding style函数需要CamelCase:https://google.github.io/styleguide/cppguide.html#Function_Names 是不是这样会更清晰: class LrPolicy {
public:
virtual ~LrPolicy() {}
virtual double LearningRate(const uint64_t num_sample_passed) = 0;
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought it is a base class, derived classes named with the convention of XXXLr. fix done. |
||
|
||
protected: | ||
double learning_rate; | ||
}; | ||
|
||
// constant learning rate policy | ||
class ConstLr final : public BaseLr { | ||
public: | ||
ConstLr(double lr) : BaseLr(lr){}; | ||
double get_learning_rate(const uint64_t num_sample_passed) { | ||
return learning_rate; | ||
} | ||
}; | ||
|
||
class LinearLr final : public BaseLr { | ||
public: | ||
LinearLr(double lr, double lr_decay_a, double lr_decay_b) | ||
: BaseLr(lr), lr_decay_a(lr_decay_a), lr_decay_b(lr_decay_b) {} | ||
double get_learning_rate(const uint64_t num_sample_passed) { | ||
return std::max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b); | ||
} | ||
|
||
private: | ||
double lr_decay_a; | ||
double lr_decay_b; | ||
}; | ||
|
||
} // namespace optimizer | ||
} // namespace paddle | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use all low case for filenames (
tensor_test.cpp
instead ofTensor_test.cpp
). Some filesystem is not case sensitive.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done