Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always create CAFFE:RNG for prefetch thread to simplify logic #558

Closed
wants to merge 1 commit into from

Conversation

sguada
Copy link
Contributor

@sguada sguada commented Jun 28, 2014

To avoid untested cases in the Data_layer, Image_Data_Layer and Window_Data_Layer always create the random number generator.
This should fix #553 #508 and should help in the Data source redesign @Yangqing

@sguada
Copy link
Contributor Author

sguada commented Jun 28, 2014

For some reason I don't understand yet the test_data_layer.bin now fails.
@jeffdonahue Do you know why creating RNG always makes this tests to fail?

@sguada
Copy link
Contributor Author

sguada commented Jun 29, 2014

It seems a problem related to setting the seed in the test.

@kloudkl
Copy link
Contributor

kloudkl commented Jul 1, 2014

build/test/test_data_layer.testbin --gtest_filter="-*GPU*"
Tests passed:

[       OK ] DataLayerTest/0.TestReadLevelDBCPU (45 ms)
[       OK ] DataLayerTest/0.TestReadCropTrainSequenceUnseededLevelDBCPU (48 ms)
[       OK ] DataLayerTest/0.TestReadCropTestLevelDBCPU (31 ms)
[       OK ] DataLayerTest/0.TestReadLMDBCPU (10 ms)
[       OK ] DataLayerTest/0.TestReadCropTrainLMDBCPU (4 ms)
[       OK ] DataLayerTest/0.TestReadCropTrainSequenceSeededLMDBCPU (6 ms)
[       OK ] DataLayerTest/0.TestReadCropTrainSequenceUnseededLMDBCPU (11 ms)
[       OK ] DataLayerTest/1.TestReadLevelDBCPU (66 ms)
[       OK ] DataLayerTest/1.TestReadCropTrainLevelDBCPU (40 ms)
[       OK ] DataLayerTest/1.TestReadCropTrainSequenceUnseededLevelDBCPU (53 ms)
[       OK ] DataLayerTest/1.TestReadLMDBCPU (15 ms)
[       OK ] DataLayerTest/1.TestReadCropTrainSequenceUnseededLMDBCPU (7 ms)
[       OK ] DataLayerTest/1.TestReadCropTestLMDBCPU (7 ms)

Tests failed:

[----------] Global test environment tear-down
[==========] 20 tests from 2 test cases ran. (537 ms total)
[  PASSED  ] 13 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] DataLayerTest/0.TestReadCropTrainLevelDBCPU, where TypeParam = float
[  FAILED  ] DataLayerTest/0.TestReadCropTrainSequenceSeededLevelDBCPU, where TypeParam = float
[  FAILED  ] DataLayerTest/0.TestReadCropTestLMDBCPU, where TypeParam = float
[  FAILED  ] DataLayerTest/1.TestReadCropTrainSequenceSeededLevelDBCPU, where TypeParam = double
[  FAILED  ] DataLayerTest/1.TestReadCropTestLevelDBCPU, where TypeParam = double
[  FAILED  ] DataLayerTest/1.TestReadCropTrainLMDBCPU, where TypeParam = double
[  FAILED  ] DataLayerTest/1.TestReadCropTrainSequenceSeededLMDBCPU, where TypeParam = double

@kloudkl
Copy link
Contributor

kloudkl commented Jul 1, 2014

The test cases are trying to test the combinations of cropping, scaling and mirroring. It is almost impossible to debug all the entangled interactions.

The bad smells of the intricate test cases suggest that it is urgent to refactor the data layers.

@kloudkl
Copy link
Contributor

kloudkl commented Jul 9, 2014

diff --git a/include/caffe/data_layers.hpp b/include/caffe/data_layers.hpp
index 06c90ea..d3c5273 100644
--- a/include/caffe/data_layers.hpp
+++ b/include/caffe/data_layers.hpp
@@ -50,6 +50,11 @@ class DataLayer : public Layer<Dtype> {
   virtual inline int MinTopBlobs() const { return 1; }
   virtual inline int MaxTopBlobs() const { return 2; }

+  virtual inline Caffe::Phase phase() const { return phase_; }
+  virtual inline void set_phase(const Caffe::Phase phase) {
+    phase_ = phase;
+  }
+
  protected:
   virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
       vector<Blob<Dtype>*>* top);
@@ -204,6 +209,11 @@ class ImageDataLayer : public Layer<Dtype> {
   virtual inline int ExactNumBottomBlobs() const { return 0; }
   virtual inline int ExactNumTopBlobs() const { return 2; }

+  virtual inline Caffe::Phase phase() const { return phase_; }
+  virtual inline void set_phase(const Caffe::Phase phase) {
+    phase_ = phase;
+  }
+
  protected:
   virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
       vector<Blob<Dtype>*>* top);
diff --git a/src/caffe/test/test_data_layer.cpp b/src/caffe/test/test_data_layer.cpp
index 8ba5f29..b3364a8 100644
--- a/src/caffe/test/test_data_layer.cpp
+++ b/src/caffe/test/test_data_layer.cpp
@@ -156,7 +156,9 @@ class DataLayerTest : public ::testing::Test {
     data_param->set_crop_size(1);
     data_param->set_source(filename_->c_str());
     data_param->set_backend(backend_);
+    LOG(ERROR) << "Caffe::TRAIN " << Caffe::TRAIN;
     DataLayer<Dtype> layer(param);
+    layer.set_phase(Caffe::phase());
     layer.SetUp(blob_bottom_vec_, &blob_top_vec_);
     EXPECT_EQ(blob_top_data_->num(), 5);
     EXPECT_EQ(blob_top_data_->channels(), 2);
@@ -208,6 +210,7 @@ class DataLayerTest : public ::testing::Test {
     vector<vector<Dtype> > crop_sequence;
     {
       DataLayer<Dtype> layer1(param);
+      layer1.set_phase(Caffe::phase());
       layer1.SetUp(blob_bottom_vec_, &blob_top_vec_);
       for (int iter = 0; iter < 2; ++iter) {
         layer1.Forward(blob_bottom_vec_, &blob_top_vec_);
@@ -229,6 +232,7 @@ class DataLayerTest : public ::testing::Test {
     // Check that the sequence is the same as the original.
     Caffe::set_random_seed(seed_);
     DataLayer<Dtype> layer2(param);
+    layer2.set_phase(Caffe::phase());
     layer2.SetUp(blob_bottom_vec_, &blob_top_vec_);
     for (int iter = 0; iter < 2; ++iter) {
       layer2.Forward(blob_bottom_vec_, &blob_top_vec_);

@kloudkl
Copy link
Contributor

kloudkl commented Jul 9, 2014

@sguada, @shelhamer, @jeffdonahue, the bug is due to that the DataLayer::phase_ was never explicitly initialized or set. The ImageDataLayer is not tested as thoroughly but has the same bug. The WindowDataLayer doesn't have the phase_ field. Please help fix them.

@escorciav
Copy link

Just to inform, the branch of sguada was functional to overcome the following error:

I0804 19:06:36.845106 22451 solver.cpp:85] Solving CaffeNet
I0804 19:06:36.845149 22451 solver.cpp:138] Iteration 0, Testing net (#0)
F0804 19:06:36.898147 52654 data_layer.cpp:338] Check failed: prefetch_rng_
*** Check failure stack trace: **
@ 0x2b2e9e2beabd google::LogMessage::Fail()
@ 0x2b2e9e2c0b97 google::LogMessage::SendToLog()
@ 0x2b2e9e2be6ac google::LogMessage::Flush()
@ 0x2b2e9e2c145d google::LogMessageFatal::~LogMessageFatal()
@ 0x4e3cb5 caffe::DataLayer<>::PrefetchRand()
@ 0x4e1f8b caffe::DataLayerPrefetch<>()
@ 0x36556077f1 (unknown)
@ 0x3654ee570d (unknown)

If you have clues pointing to prefetch_rng_ issue, it is doable to use this branch. However, I have no idea if it produces another errors. @sguada, Have you used it in your projects?
Thank you for this patch and please don’t delete your prefetch_rng branch.

@shelhamer
Copy link
Member

Closing, since I don't think this is still needed after #710 and #954. Let me know if I'm mistaken!

@shelhamer shelhamer closed this Aug 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants