Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic data reading of image_data_layer in parallel training #4590

Closed
AIROBOTAI opened this issue Aug 14, 2016 · 3 comments
Closed

Comments

@AIROBOTAI
Copy link

Hi all, I have a question about the deterministic batch input of image_data_layer when doing parallel training. Suppose we have a dataset which only contains four batches named A, B, C, D, respectively. And we have 4 solvers (S1,S2,S3,S4) by using 4 GPUs. We also suppose that the dataset will not be randomly shuffled during training. I have checked the implementation of BasePrefetchingDataLayer to find it is only guaranteed that different solvers get their input batch sequentially but not in fixed order. Then I wonder we may encounter the following problem: at T-th iteration, the input batch for S1S4 may be A, B,C, D, respectively, but at the next iteration, it is quite probable the input batch for S1S4 might become B, C, A, D or something else. Such non-deterministic behavior may be dangerous in some cases. Could anyone kindly tell me whether my above doubt are correct?

Besides, could anyone please explain to me why the following sentences "using Params::size_; using Params::data_; using Params::diff_;" are used in the definition of classes: GPUParams and P2PSync (defined in parallel.hpp)? Personally, the using-declarations are generally to solve the problem that members in base class are shadowed in derived class, which however seems not the case for GPUParams and P2PSync. Therefore, I wonder if these sentences are necessary.

Thanks in advance!

@cypof
Copy link
Member

cypof commented Aug 15, 2016

Currently only the LMDB and LevelDB layers are deterministic, by going through data_reader. We are trying to fix the other layers by switching to a skip approach like in #4563. The using statements are only there to avoid having to type this-> every time. C++ should allow direct access to protected fields, but it doesn't for templated classes for some reason.

@AIROBOTAI
Copy link
Author

@cypof Thanks a lot for your kind help! Got it now.

@shelhamer
Copy link
Member

The determinism of data loading is fixed in the new parallelism #4563.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants