-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot model weights/solver state to HDF5 files #2836
Conversation
0241c9f
to
86cdba4
Compare
This satisfies part of #1211. |
can i access the networks which are blobs using hdf5? If it can, please show the example. |
I've skimmed through this and it mostly looks good, thanks @erictzeng. My one piece of feedback right now is that |
<< "Error reading weights from " << trained_filename; | ||
// Check that source layer doesn't have more params than target layer | ||
int num_source_params = hdf5_get_num_links(layer_hid); | ||
CHECK_LE(num_source_params, target_blobs.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check equality? You might want to know for instance that the source layer has a bias but the target does not. Sorry, the check in 799-808 covers the rest.
This will be a good switch, and the backward compatibility saves a lot of heartache, but we might consider bringing the documentation and examples along with us as there are references to the current extensions here and there. This looks good to me code-wise (once Jeff's comment is addressed) but you could squash related changes and fixes when you're done. Since the weight sharing tests don't cover save and restore ( Thanks @erictzeng! |
The tests I added in #2866 do cover this (though they're less unit tests and more integration tests than what you propose, as they also rely on the solver snapshot/restore correctness). |
@jeffdonahue oh sweet, |
@bhack this lets us keep the same dependencies and interface for defining models. Migrating away to protobuf for a new format needs a good argument and its own issue since model definitions would change. |
@shelhamer Flatbuffers support .proto parsing for easier migration from Protocol Buffers |
when using Dtype == double
restoring net/solver from snapshot
5511688
to
6799ddc
Compare
That should be all comments addressed! The constant has been lowered to 32 as requested, and history has been squashed. Let me know if anything else seems off. @Yeongtae I'm not sure I fully understand what you're asking, but this PR allows you to access network parameters via HDF5, if that's what you want. The parameters are stored in a fairly simple structure. Here's how you'd peek at the conv1 parameters in lenet:
The datasets 0 and 1 correspond to the weights and biases of the layer, respectively. |
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); | ||
CHECK_GE(layer_data_hid, 0) | ||
<< "Error saving weights to " << filename << "."; | ||
hid_t layer_diff_hid = H5Gcreate2(diff_hid, layer_name.c_str(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the diff
dataset only be created if write_diff
is set?
6799ddc
to
73058c8
Compare
Summary of changes: - HDF5 helper functions were moved into a separate file util/hdf5.cpp - hdf5_save_nd_dataset now saves n-d blobs, can save diffs instead of data - Minor fix for memory leak in HDF5 functions (delete instead of delete[]) - Extra methods have been added to both Net/Solver enabling snapshotting and restoring from HDF5 files - snapshot_format was added to SolverParameters, with possible values HDF5 or BINARYPROTO (default HDF5) - kMaxBlobAxes was reduced to 32 to match the limitations of HDF5
73058c8
to
c9b333e
Compare
Everything looks good, thanks Eric! |
Snapshot model weights/solver state to HDF5 files
My vote still go to flatbuffers as a natural google successor to protobuf. But with this merge hdf5 it is the de facto standard for caffe models now and nobody replied to the evaluation process of protobuff substitute. |
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
Adapt HDF5DataLayer Prefetch to BVLC#2836
What about a python interface for saving a net to HDF5? This can be useful for "net surgery".
To However, I got this error:
|
@shelhamer There seems to be some "hiccups" with snapshoting to hdf5 format. |
@shaibagon I'm not aware of any issue, so could you post an issue with details to reproduce the problem with Caffe master? I don't know anything about the OpenCV DNN package mentioned at that SO link. Please mention @erictzeng in the issue as the author of this PR. |
Snapshot model weights/solver state to HDF5 files * erictzeng/hdf5_snapshot: (29 commits) Update example bash scripts to expect .h5, new extensions in .gitignore TestSnapshot expects .h5 snapshots, explicitly checks history. Snapshot model weights/solver state to HDF5 files. TestGradientBasedSolver: add TestSnapshot to verify behavior when restoring net/solver from snapshot add double_data, double_diff to BlobProto for weights/snapshots saved when using Dtype == double Fix typo PythonLayer takes parameters by string [pytest] open exception file with mode for python3 [pycaffe,build] include Python first in caffe tool ImageData layer default batch size of 1, and check for zero batch size Change log levels in upgrade_proto [docs] add CONTRIBUTING.md which will appear on GitHub new Issue/PR pages [docs] fix contrastive loss eq [docs] fix lmdb fetch url and path [docs] clear up PYTHONPATH confusion Fix path to mnist_autoencoder.prototxt [docs] set lmdb url to github mirror [docs] matlab 2015a compatible Travis scripts for python3 and pytest for cmake. Also fixes CUDA CMake build issue BVLC#2722. [examples] fix link to point to new tutorial notebook ... Conflicts: .travis.yml include/caffe/python_layer.hpp scripts/travis/travis_build_and_test.sh scripts/travis/travis_install.sh src/caffe/proto/caffe.proto src/caffe/solver.cpp src/caffe/test/test_gradient_based_solver.cpp tools/caffe.cpp
This pull request enables Caffe to snapshot model weights and solver states to HDF5 files and makes this format the default. This format provides a number of advantages:
To avoid confusion with the old snapshotting methods, snapshotting to HDF5 files adopts new file extensions, namely
.caffemodel.h5
and.solverstate.h5
. When restoring either weights or solver history from a file, the extension of the file is checked. If the extension is.h5
, it is loaded as an HDF5 file. All other extensions are treated as a binary protobuf file and loaded as before.The default snapshot format is switched to HDF5 in this PR. If you prefer the old method, you can add
snapshot_format: BINARYPROTO
to your solver prototxt to restore binary protobuf snapshotting.A few miscellaneous details:
TestSnapshot
test for gradient-based solvers.util/io.cpp
have been moved out to their own file,util/hdf5.cpp
, and additional helper functions have been added.Net
and theSolver
, since we now have methods for both BinaryProto and HDF5. Everything in Caffe checks out, but downstream users who have implemented their own non-SGD solvers/solvers with nonstandard snapshotting may have a bad time.Potential caveats
hdf5_save_nd_dataset
. Previously, said function always saved 4-D blobs. It has since been changed to save N-D blobs instead. This could potentially break people's workflows if they were relying on HDF5OutputLayers to output 4-D blobs.There aren't any tests that compare the loaded solver history.Possible extensions
These extensions won't end up in this PR, but possible things to do after this wraps up: