[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892

ronghanghu · 2015-08-09T22:17:17Z

Cleaned up #2271 to adapt to #2836. Original authors are @jeffdonahue and @pclove1

Note: this adds prefetching but disables shuffling rows for hdf5, and still preserve shuffling files.

DO NOT MERGE

shelhamer · 2015-08-09T22:24:16Z

It might be best if @erictzeng takes a look at this given his recent hdf5 work.

ronghanghu · 2015-08-10T00:33:43Z

If I understand correctly, shuffling is still preserved in FillHDF5FileData

https://github.com/ronghanghu/caffe/blob/hdf5-prefetch/src/caffe/layers/hdf5_data_layer.cpp#L61-L64

shelhamer · 2015-08-10T00:41:10Z

This still shuffles the order of hdf5 files but no longer shuffles rows within hdf5. Generally rows >> files so shuffling is limited by this change. This is no different than lmdb / leveldb however.

Adapt HDF5DataLayer Prefetch to BVLC#2836

ronghanghu · 2015-08-10T02:13:05Z

@shelhamer OK, I see. This may be a serious drawback of this PR.

Instead of this PR, we can also keep the current shuffle behavior and just implement the prefetch (using additional prefetch memory blob, like in other prefetch data layers). I am hacking that directly based upon #2870.

jeffdonahue · 2015-08-10T21:29:05Z

Another disadvantage of the PR (which I didn't realize until I started using HDF5DataLayer myself after writing the initial version of this PR) is the optimization in the case of the entire dataset being a single HDF5 file -- with the current HDF5DataLayer, the entire file is loaded into memory initially and nothing is ever read from disk again. I'm not sure an HDF5 prefetching PR should be merged until the optimization of this important special case is somehow brought back.

jeffdonahue and others added 2 commits August 9, 2015 14:32

New HDF5 read interface

de90f6e

fix BVLC#1362. prefetch HDF5DataLayer.

87b27d1

shelhamer added the JD label Aug 9, 2015

ronghanghu force-pushed the hdf5-prefetch branch 5 times, most recently from 2b7c2e4 to 70168ba Compare August 9, 2015 23:24

ronghanghu mentioned this pull request Aug 9, 2015

fix #1362. prefetch HDF5DataLayer #2271

Closed

ronghanghu force-pushed the hdf5-prefetch branch from 70168ba to 11d0d74 Compare August 10, 2015 01:52

rebase & clean up HDF5DataLayer Prefetch

11d0d74

Adapt HDF5DataLayer Prefetch to BVLC#2836

ronghanghu changed the title ~~Rebase and Clean up Hdf5DataLayer Prefetch~~ [Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch Aug 10, 2015

This was referenced Sep 3, 2015

Caffe cannot handle HDF5 files larger as large as 20GB? #2953

Closed

Add support for HDF5 datasets NVIDIA/DIGITS#226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892

ronghanghu commented Aug 9, 2015

shelhamer commented Aug 9, 2015

ronghanghu commented Aug 10, 2015

shelhamer commented Aug 10, 2015

ronghanghu commented Aug 10, 2015

jeffdonahue commented Aug 10, 2015

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892

Are you sure you want to change the base?

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch #2892

Conversation

ronghanghu commented Aug 9, 2015

shelhamer commented Aug 9, 2015

ronghanghu commented Aug 10, 2015

shelhamer commented Aug 10, 2015

ronghanghu commented Aug 10, 2015

jeffdonahue commented Aug 10, 2015