Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and improve multiple places about random number generation and shuffling #848

Merged
merged 1 commit into from
Aug 4, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions include/caffe/util/rng.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,40 @@
#define CAFFE_RNG_CPP_HPP_

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <iterator>
#include <algorithm>
#include "caffe/common.hpp"

namespace caffe {

typedef boost::mt19937 rng_t;
typedef boost::mt19937 rng_t;

inline rng_t* caffe_rng() {
return static_cast<caffe::rng_t*>(Caffe::rng_stream().generator());
inline rng_t* caffe_rng() {
return static_cast<caffe::rng_t*>(Caffe::rng_stream().generator());
}

// Fisher–Yates algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Fisher–Yates algorithm covered by std::random_shuffle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhack:
std::random_shuffle internally uses rand() as the random number generator, whose randomness quality is horrible. Besides, the original call to std::random_shuffle is not preceded by srand(), resulting in the same order every time.

C++11 actually has a std::shuffle, but sadly we cannot use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::random_shuffle doesn't receive the random number generator as last (optional) parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhack

See http://stackoverflow.com/questions/19219726/what-is-the-difference-between-shuffle-and-random-shuffle-c

You cannot pass in caffe::rng_t a.k.a boost::mt19937 as the third parameter to std::random_shuffle. Instead, you need to write a functor class, instantiate it, and then pass to std::random_shuffle, which involves probably more code than reimplementing Fisher–Yates algorithm.

template <class RandomAccessIterator, class RandomGenerator>
inline void shuffle(RandomAccessIterator begin, RandomAccessIterator end,
RandomGenerator* gen) {
typedef typename std::iterator_traits<RandomAccessIterator>::difference_type
difference_type;
typedef typename boost::uniform_int<difference_type> dist_type;

difference_type length = std::distance(begin, end);
if (length <= 0) return;

for (difference_type i = length - 1; i > 0; --i) {
dist_type dist(0, i);
std::iter_swap(begin + i, begin + dist(*gen));
}
}

template <class RandomAccessIterator>
inline void shuffle(RandomAccessIterator begin, RandomAccessIterator end) {
shuffle(begin, end, caffe_rng());
}
} // namespace caffe

#endif // CAFFE_RNG_HPP_
13 changes: 12 additions & 1 deletion src/caffe/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,17 @@ shared_ptr<Caffe> Caffe::singleton_;
// random seeding
int64_t cluster_seedgen(void) {
int64_t s, seed, pid;
FILE* f = fopen("/dev/urandom", "rb");
if (f && fread(&seed, 1, sizeof(seed), f) == sizeof(seed)) {
fclose(f);
return seed;
}

LOG(INFO) << "System entropy source not available, "
"using fallback algorithm to generate seed instead.";
if (f)
fclose(f);

pid = getpid();
s = time(NULL);
seed = abs(((s * 181) * ((pid - 83) * 359)) % 104729);
Expand Down Expand Up @@ -75,7 +86,7 @@ Caffe::RNG::RNG() : generator_(new Generator()) { }
Caffe::RNG::RNG(unsigned int seed) : generator_(new Generator(seed)) { }

Caffe::RNG& Caffe::RNG::operator=(const RNG& other) {
generator_.reset(other.generator_.get());
generator_ = other.generator_;
return *this;
}

Expand Down
11 changes: 3 additions & 8 deletions src/caffe/layers/image_data_layer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -238,14 +238,9 @@ void ImageDataLayer<Dtype>::CreatePrefetchThread() {

template <typename Dtype>
void ImageDataLayer<Dtype>::ShuffleImages() {
const int num_images = lines_.size();
for (int i = 0; i < num_images; ++i) {
const int max_rand_index = num_images - i;
const int rand_index = PrefetchRand() % max_rand_index;
pair<string, int> item = lines_[rand_index];
lines_.erase(lines_.begin() + rand_index);
lines_.push_back(item);
}
caffe::rng_t* prefetch_rng =
static_cast<caffe::rng_t*>(prefetch_rng_->generator());
shuffle(lines_.begin(), lines_.end(), prefetch_rng);
}


Expand Down
3 changes: 2 additions & 1 deletion tools/convert_imageset.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

#include "caffe/proto/caffe.pb.h"
#include "caffe/util/io.hpp"
#include "caffe/util/rng.hpp"

using namespace caffe; // NOLINT(build/namespaces)
using std::pair;
Expand Down Expand Up @@ -60,7 +61,7 @@ int main(int argc, char** argv) {
if (argc >= (arg_offset+5) && argv[arg_offset+4][0] == '1') {
// randomly shuffle data
LOG(INFO) << "Shuffling data";
std::random_shuffle(lines.begin(), lines.end());
shuffle(lines.begin(), lines.end());
}
LOG(INFO) << "A total of " << lines.size() << " images.";

Expand Down