Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge with lasted develop branch. Optimizer lib2 #2386

Merged
merged 35 commits into from
Jun 20, 2017
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
62cd5c7
"failed to resolve conflict. apply to HEAD"
dzhwinter Jun 4, 2017
3158efe
"move cmake scripts too"
dzhwinter Jun 4, 2017
5b8a0c5
"optimizer remove init create with proto"
dzhwinter Jun 5, 2017
8610ba1
"remove get config proto"
dzhwinter Jun 5, 2017
b4aa0ec
"modify update interface"
dzhwinter Jun 5, 2017
26e9c4e
"add vector alias to make name clear"
dzhwinter Jun 5, 2017
5f9cd8c
"rename test file name"
dzhwinter Jun 5, 2017
b9d024e
"remove useless test file"
dzhwinter Jun 5, 2017
5ab958b
"change size_t type to avoid warning"
dzhwinter Jun 5, 2017
fd8c510
"format name with google style"
dzhwinter Jun 6, 2017
3b1294a
"add checkpoint interface: set state, get state"
dzhwinter Jun 6, 2017
81cad37
"remove comments"
dzhwinter Jun 6, 2017
beb2697
"change header guard to pragma"
dzhwinter Jun 6, 2017
5a1e678
"update macro and fix some part"
dzhwinter Jun 6, 2017
bc26df7
"polish code style and update based review comment"
dzhwinter Jun 7, 2017
b9cb0f2
"update marco"
dzhwinter Jun 7, 2017
6cbbc2e
"add comments"
dzhwinter Jun 7, 2017
f5ff283
"fix comment"
dzhwinter Jun 7, 2017
e456796
"update with comment"
dzhwinter Jun 9, 2017
33b4dee
"update serialization part"
dzhwinter Jun 9, 2017
0fc4201
"update interface"
dzhwinter Jun 9, 2017
b7e68e0
"serialization modify"
dzhwinter Jun 11, 2017
b72e8aa
"seperate serialization proto state"
dzhwinter Jun 13, 2017
1814fc2
"fix lr_policy serialization"
dzhwinter Jun 14, 2017
e148bc1
"remove unused tensor line"
dzhwinter Jun 14, 2017
a46f3fc
"fix double release tensor buffer error."
dzhwinter Jun 14, 2017
df5bc78
"fix tensor shared_ptr"
dzhwinter Jun 15, 2017
65d9e33
"modify config name"
dzhwinter Jun 19, 2017
ec65fa8
"protobuf required to optional"
dzhwinter Jun 19, 2017
baef96e
Merge branch 'develop' into optimizer_lib2
dzhwinter Jun 19, 2017
99849cf
rename Tensor.h
dzhwinter Jun 19, 2017
72b6b26
"ci formatter"
dzhwinter Jun 19, 2017
03884f0
formatter
dzhwinter Jun 19, 2017
a166e52
"formatter in docker"
dzhwinter Jun 19, 2017
33ddc89
formatter in docker
dzhwinter Jun 19, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions paddle/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ add_subdirectory(gserver)
add_subdirectory(pserver)
add_subdirectory(trainer)
add_subdirectory(scripts)
add_subdirectory(optimizer)

# Do not build go directory until go cmake is working smoothly.
# if(CMAKE_Go_COMPILER)
Expand Down
17 changes: 17 additions & 0 deletions paddle/optimizer/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})

set(OPITMIZER_SRCS
adadelta_optimizer.cc
adagrad_optimizer.cc
adam_optimizer.cc
optimizer.cc
parameter_optimizer.cc
sgd_optmizer.cc
)

add_library(optimizer STATIC ${OPITMIZER_SRCS})
add_dependencies(optimizer gen_proto_cpp)

add_simple_unittest(tensor_test)
add_simple_unittest(parameter_optimizer_test)
add_dependencies(parameter_optimizer_test optimizer)
52 changes: 52 additions & 0 deletions paddle/optimizer/Tensor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#pragma once
/**
* @brief tensor used by optimizer
*/

#include <string.h>
#include <memory>
#include "paddle/utils/Common.h"
#include "paddle/utils/Logging.h"

namespace paddle {
namespace optimizer {

template <class T>
class TensorT {
public:
TensorT(size_t size) : height_(1), width_(size) { data_ = new T[size]; }
TensorT(T* data, size_t size) : height_(1), width_(size), data_(data) {}
TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data_) {}
TensorT(const TensorT& t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain to me what does ":" do here? Sorry I am not too familiar, and don't know what's the keyword to search for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is initializer in c++, which is the idiomatic way in c++ initializes parameter.

please check here for detail. http://en.cppreference.com/w/cpp/language/direct_initialization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

括号后面的":",这个是c++里的初始化手段,和构造函数还不是同一个概念。初始化列表和构造函数的关系类比python的__new__和__init__的关系。初始化列表会在构造函数前完成(就是花括号里的东西)。
中文叫 初始化列表,英文叫 constructor initializer list。
1、初始化列表在任何函数执行之前完成
2、初始化列表中的参数赋值顺序是由成员声明顺序决定
并且对于非POD类型的成员具有限制:
https://stackoverflow.com/questions/5816218/difference-between-initializer-and-default-initializer-list-in-c
https://stackoverflow.com/questions/9903248/initializing-fields-in-constructor-initializer-list-vs-constructor-body

一般推荐非静态成员都使用该方法初始化

: TensorT(1, t.size(), 0, t.get_buffer(), false, false) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here is copy constructor, and what here is doing is that it created a new tensor, copying from the old tensor. And they shared the same buffer.
Seems when the two tensors gets destroyed, they will try to destroy the same buffer twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fix done.

~TensorT() {
if (data_) delete data_;
}

TensorT& operator=(const TensorT& t) {
this->width_ = t.size();
this->data_ = t.get_buffer();
}
T* get_buffer() { return this->data_; }
T& operator[](const size_t idx) {
CHECK(idx >= 0 && idx < this->width_) << "out of index range";
return data_[idx];
}
T& operator[](const size_t idx) const {
CHECK(idx >= 0 && idx < this->width_) << "out of index range";
return data_[idx];
}
// TODO: replace with tensorshape
size_t size() const { return this->width_ * this->height_; }

protected:
size_t height_;
size_t width_;
T* data_;
};

// TODO(zhihong): design problem of dynamic datatype, need to fix it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when porting "majel" to PaddlePaddle, we already included boost/variant.hpp for the "single value multiple type" container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, 👍. Either we can wait for their majel port job finish, or implement another one with typeid reflection. It is a follow-up question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

typedef TensorT<real> Tensor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know whether real means float or double. Doing a Google search does not answer my question. Maybe for clarity, we need to change all real in this PR to float? (since currently we are working with 32 bit floating point parameters.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real type is a macro type in the whole project. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/Common.h#L30
which is determined when the project is compiled. It is not a good choice since our type is passed by caller's data type, which is determined at runtime. Need to be fixed.

Copy link
Contributor Author

@dzhwinter dzhwinter Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me explain this part more clear.
In the last PR, I just implement Tensor with template class, but it is the wrong way. Because template class will be lead to some specific types in compile period( explain the reason in a meeting will be better).
In fact, tensor should represent some sort of consistent memory, and support dynamic type in runtime
e.g. https://github.com/caffe2/caffe2/blob/master/caffe2/core/tensor.h#L643
just like an enhanced any type in boost library, we can merge this PR, and enrich tensor library in the future. At this moment, we only support compile period datatype here.

Copy link
Contributor

@helinwang helinwang Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fine. Can you use float instead of real? Because parameter is supposed to be a general data store, should not be dependent on how real is defined on Paddle.

After this PR we need to work on another PR to add tensors of required data types.

MATCH_ENUM_TYPE(int32_t, PADDLE_ELEMENT_TYPE_INT32);
MATCH_ENUM_TYPE(uint32_t, PADDLE_ELEMENT_TYPE_UINT32);
MATCH_ENUM_TYPE(int64_t, PADDLE_ELEMENT_TYPE_INT64);
MATCH_ENUM_TYPE(uint64_t, PADDLE_ELEMENT_TYPE_UINT64);
// only below is implemented, we need to implement other types in a follow up PR.
MATCH_ENUM_TYPE(float, PADDLE_ELEMENT_TYPE_FLOAT32);
MATCH_ENUM_TYPE(double, PADDLE_ELEMENT_TYPE_FLOAT64);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fix done.


} // namespace optimizer
} // namespace paddle
20 changes: 20 additions & 0 deletions paddle/optimizer/Tensor_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#include <iostream>
Copy link
Member

@jacquesqiao jacquesqiao Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should add Copyright message to these new files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyRights declaration can be added when we merge these modules together. I noticed that go module part all need to add that header section.

#include "gtest/gtest.h"
#include "tensor.h"

using namespace paddle;
using namespace paddle::optimizer;

TEST(Tensor, indexer) {
Tensor t(3);
for (auto i = 0; i < t.size(); ++i) {
t[i] = i;
}
ASSERT_EQ(t[2], 2);
ASSERT_EQ(t[1], 1);
}

int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
36 changes: 36 additions & 0 deletions paddle/optimizer/adadelta_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#include "adadelta_optimizer.h"
#include <algorithm>
#include <cmath>

namespace paddle {
namespace optimizer {

void AdadeltaOptimizer::set_weight(Tensor* p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the content of p is no used in function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need Google style function names. Please replace all C++ function names with CamelCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://google.github.io/styleguide/cppguide.html#Function_Names

Accessors and mutators (get and set functions) may be named like variables. These often correspond to actual member variables, but this is not required. For example, int count() and void set_count(int count).
since this function modify the parameter member of the optimizer, in my mind, it is an accessor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sorry!

parameter_ = p;
size_t size = p->size();
accum_gradient_ = new Tensor(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we free previous accum_gradient_, accum_delta_, update_delta_ if not NULL? Same for other set_weight functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, an optimizer instance is mapping to a parameter, we never reuse the instance, even though restarting from a checkpoint. In such a situation, we should not implement with pointer guard.

Copy link
Contributor

@helinwang helinwang Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the contract should be API, not an oral or documentation contract like "we never reuse the instance". An oral contract does not prevent client user from reusing the instance. If we have a oral contract, people could easily forget. If we have a documentation contract, it could be easily get out of date, or many people will not look at the document.
A good way to prevent user to reuse the instance is to through API: remove set_weight, and put weight initialization into constructor.
If we provide the API, we need to make sure it's correct in all use case, not just how we "intend" to use it, our intention could easily change over time. I would suggest we just remove the set_weight API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true in real life. Thanks a lot! fix it.

accum_delta_ = new Tensor(size);
update_delta_ = new Tensor(size);
}

void AdadeltaOptimizer::Update(const Tensor* gradient) {
num_sample_passed_ += 1;
double learning_rate = lr_policy_->LearningRate(num_sample_passed_);
Tensor& param = *parameter_;
const Tensor& grad = *gradient;
Tensor& accum_g = *accum_gradient_;
Tensor& accum_d = *accum_delta_;
Tensor& update_d = *update_delta_;
for (size_t i = 0; i < param.size(); ++i) {
accum_g[i] = rho_ * accum_g[i] + (1.0 - rho_) * grad[i] * grad[i];

update_d[i] = std::sqrt(accum_d[i] + epsilon_) /
std::sqrt(accum_g[i] + epsilon_) * grad[i];

accum_d[i] = rho_ * accum_d[i] + (1.0 - rho_) * update_d[i] * update_d[i];

param[i] -= learning_rate * update_d[i] + learning_rate * decay_ * param[i];
}
}
} // namespace optimizer
} // namespace paddle
32 changes: 32 additions & 0 deletions paddle/optimizer/adadelta_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#pragma once

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {

class AdadeltaOptimizer : public ParameterOptimizer {
public:
AdadeltaOptimizer(double rho, double epsilon, double decay, LrPolicy *lr)
: ParameterOptimizer(lr), rho_(rho), epsilon_(epsilon), decay_(decay) {}
~AdadeltaOptimizer() {
if (accum_gradient_) delete accum_gradient_;
if (accum_delta_) delete accum_delta_;
if (update_delta_) delete update_delta_;
}
void Update(const Tensor *gradient);
void set_weight(Tensor *p);
real *get_weight() const;

private:
Tensor *accum_gradient_;
Tensor *accum_delta_;
Tensor *update_delta_;

double rho_;
double epsilon_;
double decay_;
};

} // namespace optimizer
} // namespace paddle
28 changes: 28 additions & 0 deletions paddle/optimizer/adagrad_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#include <cmath>

#include "adagrad_optimizer.h"

namespace paddle {
namespace optimizer {

void AdagradOptimizer::set_weight(Tensor* p) {
parameter_ = p;
size_t size = p->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the content of p is not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a horrible mistake....
fix done.

accum_gradient_ = new Tensor(size);
}

void AdagradOptimizer::Update(const Tensor* gradient) {
num_sample_passed_ += 1;
double learning_rate = lr_policy_->LearningRate(num_sample_passed_);
Tensor& param = *parameter_;
Tensor& accum_g = *accum_gradient_;
const Tensor& grad = *gradient;
for (size_t i = 0; i < param.size(); ++i) {
accum_g[i] += grad[i] * grad[i];
param[i] += learning_rate * grad[i] / std::sqrt(accum_g[i] + epsilon_) +
learning_rate * decay_ * param[i];
}
}

} // namespace optimizer
} // namespace paddle
26 changes: 26 additions & 0 deletions paddle/optimizer/adagrad_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#pragma once

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {

class AdagradOptimizer : public ParameterOptimizer {
public:
AdagradOptimizer(double epsilon, double decay, LrPolicy *lr)
: ParameterOptimizer(lr), epsilon_(epsilon), decay_(decay) {}
~AdagradOptimizer() {
if (accum_gradient_) delete accum_gradient_;
}
void Update(const Tensor *gradient);
void set_weight(Tensor *p);
real *get_weight() const;

private:
Tensor *accum_gradient_;
double epsilon_;
double decay_;
};

} // namespace optimizer
} // namespace paddle
32 changes: 32 additions & 0 deletions paddle/optimizer/adam_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include "adam_optimizer.h"
#include <cmath>

namespace paddle {
namespace optimizer {

void AdamOptimizer::set_weight(Tensor *p) {
parameter_ = p;
size_t size = p->size();
momentums_ = new Tensor(size);
velocitys_ = new Tensor(size);
}

void AdamOptimizer::Update(const Tensor *gradient) {
num_sample_passed_ += 1;
double learning_rate = lr_policy_->LearningRate(num_sample_passed_);
double coef1 = 1.0 - std::pow(beta_1_, num_sample_passed_);
double coef2 = 1.0 - std::pow(beta_2_, num_sample_passed_);
learning_rate *= std::sqrt(coef2) / coef1;
Tensor &param = *parameter_;
const Tensor &grad = *gradient;
Tensor &m = *momentums_;
Tensor &v = *velocitys_;
for (size_t i = 0; i < param.size(); ++i) {
m[i] = beta_1_ * m[i] + (1.0 - beta_1_) * grad[i];
v[i] = beta_2_ * v[i] + (1.0 - beta_2_) * grad[i] * grad[i];
param[i] -=
learning_rate * (m[i] / std::sqrt(v[i] + epsilon_) + decay_ * param[i]);
}
}
} // namespace optimizer
} // namespace paddle
35 changes: 35 additions & 0 deletions paddle/optimizer/adam_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#pragma once

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {

class AdamOptimizer : public ParameterOptimizer {
public:
AdamOptimizer(
double beta_1, double beta_2, double epsilon, double decay, LrPolicy *lr)
: ParameterOptimizer(lr),
beta_1_(beta_1),
beta_2_(beta_2),
epsilon_(epsilon),
decay_(decay) {}
~AdamOptimizer() {
if (momentums_) delete momentums_;
if (velocitys_) delete velocitys_;
}
void Update(const Tensor *gradient);
void set_weight(Tensor *p);
real *get_weight() const;

private:
Tensor *momentums_;
Tensor *velocitys_;
double beta_1_;
double beta_2_;
double epsilon_;
double decay_;
};

} // namespace optimizer
} // namespace paddle
42 changes: 42 additions & 0 deletions paddle/optimizer/lr_policy.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#pragma once

#include <algorithm>
#include "OptimizerConfig.pb.h"

namespace paddle {
namespace optimizer {

class LrPolicy {
public:
virtual ~LrPolicy() {}
virtual double LearningRate(const uint64_t num_sample_passed) = 0;
};

// constant learning rate policy
class ConstLr final : public LrPolicy {
public:
ConstLr(double lr) : learning_rate(lr){};
double LearningRate(const uint64_t num_sample_passed) {
return learning_rate;
}

protected:
double learning_rate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the learning_rate in LinearLr is private, but here is protected? Consider change to private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

};

class LinearLr final : public LrPolicy {
public:
LinearLr(double lr, double lr_decay_a, double lr_decay_b)
: learning_rate(lr), lr_decay_a(lr_decay_a), lr_decay_b(lr_decay_b) {}
double LearningRate(const uint64_t num_sample_passed) {
return std::max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b);
}

private:
double learning_rate;
double lr_decay_a;
double lr_decay_b;
};

} // namespace optimizer
} // namespace paddle
Loading