Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix weight sharing #2866

Merged
merged 3 commits into from
Aug 7, 2015
Merged

Fix weight sharing #2866

merged 3 commits into from
Aug 7, 2015

Conversation

jeffdonahue
Copy link
Contributor

This fixes a couple issues with weight sharing:

  • Unnecessary memory use and computation due to the fact that shared parameters currently don't share diffs. Having shared parameters share diffs is possible due to Decouple the computational batch size and minibatch size by accumulating gradients #1977 which made layers accumulate parameter gradient diffs rather than overwriting.
  • Momentum and other solver computations were incorrect due to the separation of shared parameters in the params_ vector. Someone can do the math to figure out exactly how it was incorrect if they want to; I'll just say that the added tests definitely fail without this fix.

The one possible downside is that you can no longer specify different lr_mults for shared parameters; but using this "feature" was probably a bad idea before, and if you were also using momentum or weight decay it was probably behaving in an incorrect/unexpected way.

This is based on @erictzeng's PR #2836; adding just the last commit on top of that.

@shelhamer
Copy link
Member

This looks good to me once #2836 is merged. The switch for test data to accommodate the accumulation checks is fine by me.

Thanks for sorting this out Jeff!

@shelhamer shelhamer mentioned this pull request Aug 6, 2015
8 tasks
@shelhamer
Copy link
Member

This makes progress on #1211.

@jeffdonahue I think this addresses the https://github.com/BVLC/caffe/pull/546/files#r16817721 issue by the learnable_params update loop. Could you check?

@@ -458,7 +461,7 @@ template <typename Dtype>
void SGDSolver<Dtype>::ClipGradients() {
const Dtype clip_gradients = this->param_.clip_gradients();
if (clip_gradients < 0) { return; }
const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params();
Dtype sumsq_diff = 0;
for (int i = 0; i < net_params.size(); ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could loop over learnable_params now as could the loop on 477. Now that this is a loop over learnable_params, the condition on param_owners()[i] is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed... see the above diff :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which we learn I have a short term memory of 1 line... sorry haha.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shelhamer later pointed out that the remaining check on param_owners below was wrong -- just fixed that, thanks!

@raingo
Copy link

raingo commented Aug 6, 2015

This is very nice. Does the learnable_params depend on HDF5 snapshots in #2836? If not, can we just separate learnable_params out and merge independently?

Thanks!

@jeffdonahue
Copy link
Contributor Author

@raingo if you want, you should be able to cherry-pick my last commit (you probably also need my first two commits to avoid conflicts in test_gradient_based_solver.cpp). I'm going to leave it based on #2836 as I expect that will be merged first and wanted to make sure the added tests still pass with it.

… params

-Params now share diffs as well as data (works due to layers
accumulating gradients into param diffs, rather than overwriting)
-It's now required that any shared params with specified lr_mult's,
decay_mult's match
-TestGradientBasedSolver checks that behavior remains correct with
shared weights
@jeffdonahue
Copy link
Contributor Author

Rebased now that #2836 is merged, and unrelated changes to TestGradientBasedSolver are now in separate commits. @shelhamer let me know if you want to take another look, otherwise this should be good to go.

@shelhamer
Copy link
Member

I took a last glance and didn't catch anything to change so go ahead and
merge. Thanks Jeff!
On Fri, Aug 7, 2015 at 15:41 Jeff Donahue notifications@github.com wrote:

Rebased now that #2836 #2836 is
merged, and unrelated changes to TestGradientBasedSolver are now in
separate commits. @shelhamer https://github.com/shelhamer let me know
if you want to take another look, otherwise this should be good to go.


Reply to this email directly or view it on GitHub
#2866 (comment).

jeffdonahue added a commit that referenced this pull request Aug 7, 2015
@jeffdonahue jeffdonahue merged commit 32ced4f into BVLC:master Aug 7, 2015
@jeffdonahue
Copy link
Contributor Author

Great, thanks for the review @shelhamer!

@jeffdonahue jeffdonahue deleted the fix-weight-sharing branch August 7, 2015 23:07
@jeffdonahue jeffdonahue mentioned this pull request Aug 8, 2015
10 tasks
@shelhamer
Copy link
Member

On further thought I think that learnable_params is what params really ought to be / have been so that we don't need to keep both. The only instances of the old params() left are in tests and could be replaced.

The harm in changing it could be if there is downstream code such as solvers or serializers that made use of params().

What's your take @jeffdonahue ?

@jeffdonahue
Copy link
Contributor Author

Yeah, I think I agree. Downstream code that is actually aware of weight sharing and conditions on param_owners would break, but I'd guess most downstream code isn't weight-sharing-aware and was treating params as learnable_params already. To be safe, we could also remove the public param_owners() accessor as all that logic, I think, should now be handled by Net.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants