Gradient accumulation is untested #3036

seanbell · 2015-09-05T16:55:03Z

I often run into the same problem as #2532, where it is necessary to zero-initialize the bottom diff, but not the parameter diffs. This is unintuitive, but acceptable as long as it's both documented and tested. The problem is that there are no tests to check each layer for this behavior.

If you are opposed to fixing the test as per tnarihi@7d45526 (the "+1, -1" trick), then an alternative would be to add a function to every layer that says whether or not it supports gradient accumulation (default false). Then, the gradient checker would apply the "+1, -1" trick only to those that claim to support it. In the case that someone uses iter_size > 1, the net would check all its layers and raise an exception if any layer has parameters but doesn't support gradient accumulation.

I'm opening this issue because I think it needs to be addressed in one way or another.

The text was updated successfully, but these errors were encountered:

shelhamer added the testing label Sep 8, 2015

seanbell mentioned this issue Oct 28, 2015

CuDNNConvolutionLayer accumulate gradients #3254

Merged

longjon mentioned this issue Nov 5, 2015

Check and document that bottom diffs are written, while param diffs are accumulated #3248

Closed

longjon added the documentation label Nov 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient accumulation is untested #3036

Gradient accumulation is untested #3036

seanbell commented Sep 5, 2015

Gradient accumulation is untested #3036

Gradient accumulation is untested #3036

Comments

seanbell commented Sep 5, 2015