On-the-fly net resizing, without reallocation (where possible) #594

longjon · 2014-07-03T03:41:25Z

This PR allows nets to change their input sizes in-place, reusing allocated memory for blobs and buffers. This allows, for example:

building a net with a large batch size, and using it for batches of any smaller size, without memory or significant computational cost, and
building a large convolutional net, and using it for inputs of any smaller size, again at no cost.

Net gets a new method Reshape that provides this functionality. One first resizes input blobs manually, then calls Net::Reshape with no arguments. (This has the unfortunate property that one has to momentarily "break" the net before calling Reshape, but it avoids the awkwardness of typing Reshape to accept a vector of 4-tuples.)

Net::Reshape just calls Layer::Reshape for each layer in turn, bottom-up. Since reshaping doesn't make sense for all layers, and layers may constrain acceptable new sizes, Layer::Reshape is opt-in; only layers that implement it can be reshaped. (Neuron layers can all be reshaped, most of them in a trivial way, so an implementation of NeuronLayer::Reshape is provided.)

Note that reshaping is intended only for cases where the existing parameters can continue to be used with modification. Reshaping is not intended for use with data layers. This PR provides reshapability for essentially only the layers needed for a Krizhevsky-style net.

Many layers use internal buffers, which are sometimes implemented as Blobs, sometimes shared_ptr<Blob>s, and sometimes SyncedMemorys. This PR uniformizes some of these to be just Blobs in order to simplify implementation. As far as I can tell, no shared_ptrs were removed that needed to be shared_ptrs; someone let me know if this is not true (@jeffdonahue?)

This PR includes a simple implementation of part of #355 in 33959e5f9a4c4342ee797fca71d607d74f792483. Unlike #355, the implementation is entirely within Blob::Reshape, and does not touch SyncedMemory. The disadvantage is that it doesn't call realloc when enlarging blobs; this could be added in a later patch if desired. (This PR does not address sharing blobs between train and test nets.)

Although I haven't used it a lot yet, this PR is fully-baked: tests are included, it's usable from Python, it builds everything with -Wall -Werror, and passes tests and lint. There is a clean, linear history; if the reader feels overwhelmed by the changes, I suggest reading it one commit at a time (in commit order, not github order).

kloudkl · 2014-07-03T06:44:36Z

Sounds like generalization of @sguada's #108.

kloudkl · 2014-07-03T07:15:03Z

Does this mean to partially resolve #557? The reshapable net still requires the images of a batch to be of the same size. To be more general, the involved blobs have to reshape themselves for each image on demand. Therefore the reshaping should not be initiated by the Net or the Layer.

longjon · 2014-07-03T18:08:16Z

@kloudkl, I believe this is orthogonal to both #108 and #557. Let me clarify.

#108 is a particular kind of layer, where reshaping is applied to data during a forward pass; this is about reshaping between passes, applied to entire nets.

#557, if I understand correctly, is about data layers that read variously sizes images from disk, but still produce fixed-sized top blobs. This patch does not address that issue, and actually cannot be used with data layers. It's really meant for data supplied through the Python or matlab wrapper (or custom C++ code).

One might imagine a data layer that produces variously sized top blobs, fed into a network with blobs without definite sizes. That should be a straightforward extension of this patch: just call Layer::Reshape between forward calls in the forward pass, instead of explicitly calling Net::Reshape. That also removes the awkwardness I mentioned above of having to "break" the net before calling Reshape; instead, just reshape the inputs and go!

I won't do that right away though.

sguada · 2014-07-03T18:29:10Z

@longjon thanks it is a nice PR, you even clean up quite a bit the code, specially all the temporary data :)

I will check it with the Matlab wrapper.

#557 it is only meant to allow different size images as inputs, but data layers will still produce fixed size blobs. So it is complementary to this PR

@kloudkl #108 is a layer, and therefore don't change the size dynamically, as this PR does.

sguada · 2014-07-04T17:00:47Z

python/caffe/_caffe.cpp

@@ -272,6 +272,11 @@ struct CaffeNet {
    return output_blob_names;
  }

+  void reshape(int num, int channels, int height, int width) {
+    net_->input_blobs()[0]->Reshape(num, channels, height, width);


If there is more than one input_blob how could be resized

For the Python wrapper, I just implemented the common case where one wants to resize the first input blob (usually there is only one input blob, or the second blob is a fixed-size label). If people feel it should be included, I can implement the general case, or it can be done in a future PR.

shelhamer · 2014-07-11T21:53:49Z

@longjon nice job on a long-wished for feature.

I'm a little uncomfortable about the restriction to a single (resizable) input blob. I'd vote for the general case now instead of later. Since Caffe does DAGs I'd vote for this too as well and not have some state where some features are only supported for certain classes of models.

longjon · 2014-07-11T22:40:18Z

(@shelhamer) Here's my plan for this PR:

~~add a name parameter to the Python reshape method, giving it the full power of the C++ interface (and check that that name is an input layer)~~ [reshaping is supported in Python in the same way as in C++; directly reshape input blobs and proceed with forward]
switch to the implementation under "One might imagine" above, which I've already written
add checks to that implementation so that (1) reshaping and immediately calling backward is an error instead of producing bad results, and (2) there is no performance hit from an inefficient implementation of Layer::Reshape that is not being used [(1) has been abandoned, and there are no inefficient implementations of Reshape]
support DAGs by adding reshape support to SplitLayer, which I've already written and am using
check each layer to make sure reshape is supported for exactly those layers for which reshaping makes sense [reshaping is supported for all layers]
submit a partner PR with a DataLayer that generates images of different sizes, which I've also already implemented (done in Reshape single input batches for inputs of varying dimension #1313)

I expect/hope all that will happen by the end of next week. [ha!]

shelhamer · 2014-08-29T07:53:37Z

@longjon please resurrect and complete this as outlined in #594 (comment). This'll settle plenty of workarounds with varying inputs, aspect ratio, and sequences.

longjon · 2014-08-31T08:52:51Z

@shelhamer, yes, I'll rebase this soon and add the promised features, which I already use quite a bit. There is one design issue that has kept me from hastily pushing this forward, but I'm ready with a proposed solution, coming soon.

longjon · 2014-09-13T04:30:35Z

Rebased!

As mentioned above, I've also switched to an implementation where Layer::Reshape is called between Forward calls, so that layers can reshape their top blobs in their forward passes without any special extra work. Even though this leads to some redundant computation when layers are not being reshaped, there is no effect on performance. (E.g., for 20 passes of the reference caffenet with batch dimension 256, reshaping takes a total of 1.5 ms, compared to ~19 s for forward and backward with cuDNN.)

Reshape is supported for all layers. (Of course, layers with parameters will error out if you try to reshape to a size incompatible with their parameters.)

The layer interface is changed slightly. LayerSetUp is split into LayerSetUp and Reshape, with the former being called once for one-time set up, and the latter being called before every forward pass to update the sizes of the top blobs and any internal buffers. This does not mean that more code needs to be written per layer; it just needs to be organized a little differently. Making the split was a trivial matter for most layers. Reshape is made mandatory instead of LayerSetUp, and docs are updated accordingly.

To make use of net reshaping:

if you want to write a layer that produces blobs of varying sizes, just call Blob::Reshape as necessary in the forward pass
if you want to change the size of an input blob, just call Blob::Reshape on that blob and continue with the forward pass

It is up to the user to not Reshape an input blob and then immediately call Backward (or to otherwise backprop to a just-reshaped blob, e.g. by some combination of From/To calls). There is no way to avoid this without some extra mechanism coordinating reshapes, and in practice this has not been an issue.

Reshaping can be done in pycaffe with the same interface as in C++; just call Blob.reshape on input blobs. With #1020, layers can be written in Python that produce top blobs of varying sizes.

This is ready for (re-)review.

This is in keeping with BVLC#742.

This allows nets to be reshaped very quickly (essentially for free) as long as sufficient memory has been allocated. Calling Blob::Reshape in order to free up memory becomes impossible; however, this is not a normal use case (and deleting blobs does free memory).

Note that calling Reshape when no reshape is necessary should be effectively a no-op, so this is not a performance regression.

This will make it possible to add reshaping to cuDNN layers.

shelhamer · 2014-09-18T20:33:30Z

Thanks Jon! This is not only a long-awaited improvement but a model PR with orderly and clear description and history.

On-the-fly net resizing, without reallocation (where possible)

@longjon

share the im2col / col2im buffers among convolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

czhsuccess · 2014-10-24T07:17:01Z

It seems that this PR does not support matlab wrapper, am I right?@longjon

longjon · 2014-11-01T00:49:44Z

As far as I know, support for manual reshaping of the input has not been added to the matlab wrapper, though it might or might not work fine with layers that produce tops of various sizes. @sguada, do you know the full story?

On-the-fly net resizing, without reallocation (where possible)

@longjon

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

@longjon

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

@longjon

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

@longjon

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

sguada reviewed Jul 4, 2014
View reviewed changes

This was referenced Jul 16, 2014

TDNN #705

Closed

Use Blob directly instead of shared_ptr for internal layer buffers #742

Merged

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

shelhamer added enhancement labels Aug 29, 2014

longjon force-pushed the layer-reshaping branch from bfead64 to 4c259d0 Compare September 12, 2014 23:29

longjon force-pushed the layer-reshaping branch from 4c259d0 to ec8a49c Compare September 13, 2014 04:33

longjon mentioned this pull request Sep 18, 2014

Fix types of SetUp, Forward, Backward, and gradient checker calls #945

Merged

shelhamer self-assigned this Sep 18, 2014

shelhamer mentioned this pull request Sep 18, 2014

[cancelled] Next #1109

Merged

longjon added 8 commits September 18, 2014 12:41

use Blob directly instead of shared_ptr for EltwiseLayer::max_idx_

69bf6b5

This is in keeping with BVLC#742.

add abstract Layer::Reshape, and document the new method protocol

3194bb1

enable reshaping in the forward pass

87de5ed

Note that calling Reshape when no reshape is necessary should be effectively a no-op, so this is not a performance regression.

separate setTensor4dDesc from createTensor4dDesc

5ce519c

This will make it possible to add reshaping to cuDNN layers.

separate setConvolutionDesc from createConvolutionDesc

d7e8f2a

split off Reshape for data layers

4b34c72

split off Reshape for loss layers

62bc0a8

This was referenced Sep 18, 2014

Wrap up SyncedMem resize from @kloudkl; make train/test nets share data blobs #355

Closed

Next: release candidate #1112

Merged

How to get the conv5 features with pre-trained ImageNet model for images with different width and height? #1056

Closed

longjon mentioned this pull request Sep 30, 2014

Setup standardized git commit template #1181

Closed

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#594 from longjon/layer-reshaping

8789b23

On-the-fly net resizing, without reallocation (where possible)

This was referenced Oct 2, 2014

Improve / Fix Weight Sharing #1211

Open

Contrastive loss layer for training siamese nets #959

Merged

sguada mentioned this pull request Oct 3, 2014

Added global_pooling to set the kernel size equal to the bottom size #1214

Merged

longjon mentioned this pull request Oct 14, 2014

Shared col buffer between Convolution Layers #520

Closed

shelhamer mentioned this pull request Oct 15, 2014

Scale Invariant CNN (SICNN) #576

Closed

This was referenced Oct 16, 2014

Share convolution buffers to reduce memory usage #1291

Closed

Reshape single input batches for inputs of varying dimension #1313

Merged

longjon mentioned this pull request Oct 18, 2014

Should pooling regions be identical to convolution regions? #1318

Closed

longjon mentioned this pull request Nov 1, 2014

What is the reason for having separate LayerSetUp and Reshape? #1385

Closed

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#594 from longjon/layer-reshaping

b6d4bf3

On-the-fly net resizing, without reallocation (where possible)

shelhamer mentioned this pull request Dec 15, 2014

deploy.prototxt #1560

Closed

longjon deleted the layer-reshaping branch December 30, 2014 04:59

This was referenced Jan 8, 2015

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Closed

Allow images of different sizes as inputs for dense feature extraction #1569

Closed

shelhamer mentioned this pull request Mar 3, 2015

Share convolution buffers to reduce memory usage #2016

Open

shelhamer mentioned this pull request Apr 5, 2015

text classification and dynamic input / output sizes #2244

Closed

jessebrizzi mentioned this pull request Aug 25, 2017

DEV-28879 Lifted Structured Embedding Code from rksltnl curalate/caffe#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-the-fly net resizing, without reallocation (where possible) #594

On-the-fly net resizing, without reallocation (where possible) #594

longjon commented Jul 3, 2014

kloudkl commented Jul 3, 2014

kloudkl commented Jul 3, 2014

longjon commented Jul 3, 2014

sguada commented Jul 3, 2014

sguada Jul 4, 2014

longjon Jul 5, 2014

shelhamer commented Jul 11, 2014

longjon commented Jul 11, 2014

shelhamer commented Aug 29, 2014

longjon commented Aug 31, 2014

longjon commented Sep 13, 2014

shelhamer commented Sep 18, 2014

czhsuccess commented Oct 24, 2014

longjon commented Nov 1, 2014

On-the-fly net resizing, without reallocation (where possible) #594

On-the-fly net resizing, without reallocation (where possible) #594

Conversation

longjon commented Jul 3, 2014

kloudkl commented Jul 3, 2014

kloudkl commented Jul 3, 2014

longjon commented Jul 3, 2014

sguada commented Jul 3, 2014

sguada Jul 4, 2014

Choose a reason for hiding this comment

longjon Jul 5, 2014

Choose a reason for hiding this comment

shelhamer commented Jul 11, 2014

longjon commented Jul 11, 2014

shelhamer commented Aug 29, 2014

longjon commented Aug 31, 2014

longjon commented Sep 13, 2014

shelhamer commented Sep 18, 2014

czhsuccess commented Oct 24, 2014

longjon commented Nov 1, 2014