Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient accumulation is untested #3036

Open
seanbell opened this issue Sep 5, 2015 · 0 comments
Open

Gradient accumulation is untested #3036

seanbell opened this issue Sep 5, 2015 · 0 comments

Comments

@seanbell
Copy link

seanbell commented Sep 5, 2015

I often run into the same problem as #2532, where it is necessary to zero-initialize the bottom diff, but not the parameter diffs. This is unintuitive, but acceptable as long as it's both documented and tested. The problem is that there are no tests to check each layer for this behavior.

If you are opposed to fixing the test as per tnarihi@7d45526 (the "+1, -1" trick), then an alternative would be to add a function to every layer that says whether or not it supports gradient accumulation (default false). Then, the gradient checker would apply the "+1, -1" trick only to those that claim to support it. In the case that someone uses iter_size > 1, the net would check all its layers and raise an exception if any layer has parameters but doesn't support gradient accumulation.

I'm opening this issue because I think it needs to be addressed in one way or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants