-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892
base: master
Are you sure you want to change the base?
Conversation
It might be best if @erictzeng takes a look at this given his recent hdf5 work. |
2b7c2e4
to
70168ba
Compare
If I understand correctly, shuffling is still preserved in |
This still shuffles the order of hdf5 files but no longer shuffles rows within hdf5. Generally rows >> files so shuffling is limited by this change. This is no different than lmdb / leveldb however. |
70168ba
to
11d0d74
Compare
Adapt HDF5DataLayer Prefetch to BVLC#2836
@shelhamer OK, I see. This may be a serious drawback of this PR. Instead of this PR, we can also keep the current shuffle behavior and just implement the prefetch (using additional prefetch memory blob, like in other prefetch data layers). I am hacking that directly based upon #2870. |
Another disadvantage of the PR (which I didn't realize until I started using HDF5DataLayer myself after writing the initial version of this PR) is the optimization in the case of the entire dataset being a single HDF5 file -- with the current HDF5DataLayer, the entire file is loaded into memory initially and nothing is ever read from disk again. I'm not sure an HDF5 prefetching PR should be merged until the optimization of this important special case is somehow brought back. |
Cleaned up #2271 to adapt to #2836. Original authors are @jeffdonahue and @pclove1
Note: this adds prefetching but disables shuffling rows for hdf5, and still preserve shuffling files.
DO NOT MERGE