Allow H5T_INTEGER in HDF5 files #2978

lukeyeager · 2015-08-26T02:07:17Z

Caffe is currently hardcoded to only support the H5T_FLOAT datatype class (originally added by @sergeyk in #203). All I had to do was allow the H5T_INTEGER class and now I can use HDF5 datasets created with dtype='uint8'. That lets me create datasets with much smaller filesizes, comparable to LMDB sizes with uncompressed data (see comparison here - NVIDIA/DIGITS#226 (comment)).

Was there a particular reason that H5T_INTEGER was disallowed?

jeffdonahue · 2015-09-24T19:23:57Z

src/caffe/util/hdf5.cpp

-  CHECK_EQ(class_, H5T_FLOAT) << "Expected float or double data";
+  switch (class_) {
+  case H5T_FLOAT:
+    LOG(INFO) << "Datatype class: H5T_FLOAT";


Could you remove this and the below LOG(INFO)s (or change to VLOG, or one of the LOG variations that only happens the first time it's hit, if you prefer)? I think this creates too much noise while training nets with an HDF5DataLayer, especially if using relatively small HDF5 files.

jeffdonahue · 2015-09-24T19:24:28Z

This LGTM except as commented above. Thanks @lukeyeager!

lukeyeager · 2015-09-24T20:00:32Z

Updated LOG to LOG_FIRST_N. Neat trick!

jeffdonahue · 2015-09-24T20:11:02Z

Great, thanks again @lukeyeager!

Allow H5T_INTEGER in HDF5 files

jeffdonahue · 2015-09-24T20:16:55Z

src/caffe/test/test_data/generate_sample_data.py

-    f.write(script_dir + '/sample_data.h5\n')
-    f.write(script_dir + '/sample_data_2_gzip.h5\n')
+    f.write('src/caffe/test/test_data/sample_data.h5\n')
+    f.write('src/caffe/test/test_data/sample_uint8_gzip.h5\n')


Oops -- I missed this in my review: we should have kept the script_dir-relative paths here and below. I'll fix this in the near future (or feel free to send a patch, @lukeyeager or anyone else).

I feel like I had a good reason for doing this, but I can't remember what it was now...

Oh right, this lets you run the generate_sample_data.py script from the src/caffe/test/test_data/ directory and still generate the correct paths in the text file. Otherwise, if you run the script again, the paths in the textfiles change from:

src/caffe/test/test_data/sample_data.h5

to:

/home/lyeager/caffe/caffe/src/caffe/test/test_data/sample_data.h5

I chose to fix it this way. The other way to fix it would be to remove the os.path.abspath() call on line 8. Would you rather me fix it that way?

Ah...thanks for the explanation, I didn't realize that was an issue when I merged #2274. In retrospect I think we should have just stuck with the existing behavior which required this script to be run from Caffe root, unless we were going to move to running this script to generate the data on-the-fly at test time (rather than tracking the test data files as we do now), which I think is probably a better way to do it (but might be rather involved w.r.t. the potential payoff...). Anyway, I agree with you & retract my previous suggestion, and I think we should probably revert most or all of #2274 since we can't use the script_dir-relative paths everywhere.

ih4cku · 2015-10-27T15:36:48Z

Is it ok to use uint8 for hdf5 data when caffe only uses H5LTread_dataset_float and H5LTread_dataset_double?

lukeyeager · 2015-10-27T16:18:58Z

Try it out and let me know. It works for me, so I'm assuming H5LTread_dataset_float is able to read uint8 (or really H5T_INTEGER) data and automatically convert it to float32, which makes sense.

ih4cku · 2015-10-27T16:57:35Z

I made a simple test, no problem.
More specifically, I create the h5 file with h5py filled with uint8 data, read with C++ into a float array and check each value.

I had been lazy not verifying myself. Thanks for adding this feature.@lukeyeager

lukeyeager mentioned this pull request Aug 26, 2015

Add support for HDF5 datasets NVIDIA/DIGITS#226

Merged

3 tasks

lukeyeager mentioned this pull request Sep 2, 2015

Caffe cannot handle HDF5 files larger as large as 20GB? #2953

Closed

jeffdonahue reviewed Sep 24, 2015
View reviewed changes

lukeyeager and others added 2 commits September 24, 2015 12:35

Allow H5T_INTEGER in HDF5 files

84e390c

Modify HDF5DataLayerTest to test H5T_INTEGER data

ebc9963

lukeyeager force-pushed the h5t_integer branch from e1da824 to ebc9963 Compare September 24, 2015 19:36

jeffdonahue added a commit that referenced this pull request Sep 24, 2015

Merge pull request #2978 from lukeyeager/h5t_integer

349ff65

Allow H5T_INTEGER in HDF5 files

jeffdonahue merged commit 349ff65 into BVLC:master Sep 24, 2015

jeffdonahue reviewed Sep 24, 2015
View reviewed changes

lukeyeager deleted the h5t_integer branch September 24, 2015 20:20

lukeyeager added a commit to lukeyeager/caffe that referenced this pull request Sep 24, 2015

Fix generate_sample_data.py - bug from BVLC#2978

859f938

lukeyeager mentioned this pull request Sep 24, 2015

Fix generate_sample_data.py #3115

Merged

aidangomez pushed a commit to aidangomez/caffe that referenced this pull request Sep 28, 2015

Fix generate_sample_data.py - bug from BVLC#2978

876c6a8

lukeyeager mentioned this pull request Sep 30, 2015

The glog outputs recently introduced in hdf5.cpp break compatibility … #3130

Closed

acmiyaguchi pushed a commit to acmiyaguchi/caffe that referenced this pull request Nov 13, 2017

Fix generate_sample_data.py - bug from BVLC#2978

4e75f3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow H5T_INTEGER in HDF5 files #2978

Allow H5T_INTEGER in HDF5 files #2978

lukeyeager commented Aug 26, 2015

jeffdonahue Sep 24, 2015

jeffdonahue commented Sep 24, 2015

lukeyeager commented Sep 24, 2015

jeffdonahue commented Sep 24, 2015

jeffdonahue Sep 24, 2015

lukeyeager Sep 24, 2015

lukeyeager Sep 24, 2015

jeffdonahue Sep 24, 2015

ih4cku commented Oct 27, 2015

lukeyeager commented Oct 27, 2015

ih4cku commented Oct 27, 2015

Allow H5T_INTEGER in HDF5 files #2978

Allow H5T_INTEGER in HDF5 files #2978

Conversation

lukeyeager commented Aug 26, 2015

jeffdonahue Sep 24, 2015

Choose a reason for hiding this comment

jeffdonahue commented Sep 24, 2015

lukeyeager commented Sep 24, 2015

jeffdonahue commented Sep 24, 2015

jeffdonahue Sep 24, 2015

Choose a reason for hiding this comment

lukeyeager Sep 24, 2015

Choose a reason for hiding this comment

lukeyeager Sep 24, 2015

Choose a reason for hiding this comment

jeffdonahue Sep 24, 2015

Choose a reason for hiding this comment

ih4cku commented Oct 27, 2015

lukeyeager commented Oct 27, 2015

ih4cku commented Oct 27, 2015