Subnetwork Laplace #58

edaxberger · 2021-11-30T11:11:24Z

This PR implements subnetwork Laplace, addressing issue #16.

As you can see from the code, this is fairly straightforward. We basically just need to ask the user to specify/pass a definition of a subnetwork. We then internally store the subnetwork as a vector of indices of the flattened/vectorized model parameters that form the subnetwork (for convenience to the user, we also allow passing the subnetwork as a binary mask/vector of the size of the parameter vector, where 1s indicate the subnetwork parameters). All we then need to do is index the Jacobians / gradients to extract the part corresponding to the subnetwork, both during inference time (i.e. for constructing the GGN/Fisher) and during prediction time (i.e. for constructing the predictive covariance matrix).

Some comments/questions:

I'm aware that I still need to write tests, which I'll do (I ran some tests locally to ensure that the code works generally).
The code also doesn't support the NN+MC predictive yet, which I'll add; it shouldn't be hard to implement, as we just need to sample the subnetwork weights and then combine them with the remaining MAP weights to form a parameter vector.
Should we allow the user the change the subnetwork after instantiation, or should the subnetwork remain fixed? In either case I'd need to adapt the code accordingly. For us it would be easier to not allow changing the subnetwork, as that'd require re-fitting the Hessian etc.
We'll have to be especially careful with the logic of the subnetwork_mask setter of SubnetLaplace, which needs to correctly check that the passed subnetwork_mask is in the right format (we currently support the two formats mentioned above), as the code would break otherwise. I'll double-check the logic and also add sufficient test cases for this.
Should we update all the internal documentation to reflect the subnetwork Laplace option? As the subnetwork Laplace implementation is integrated into the full Laplace code (and just adds lines to index the Jacobians/gradients), we'd have to change everything where we refer to the full parameter vector (incl. when we define tensor shapes, e.g. where parameters would become parameters_subnet). It's not hard to change, but might make the documentation more cluttered / harder to read, as we'd have to include conditional clauses I guess.
Is there are reason why jacobians() is static? Just wondering, as it makes the code and documentation a bit uglier/longer with the subnetwork option, as we always have to pass the subnetwork_indices when calling it. Compare this with gradients(), which is not static and can therefore simply access self.subnetwork_indices.
The code currently only supports using subnetwork Laplace with a full Hessian (and throws an error otherwise). In principle it'd also be possible to also use it with diagonal or KFAC Hessians, but it makes less sense conceptually, so I'd rather not support it. But then again, we also support options like diagonal last-layer, which are also not too sensible arguably?
Finally, the current implementation requires the user to specify / pass the subnetwork. We could consider also adding functionality for automatically selecting the subnetworks. Simple choices like random subnetworks or subnetworks of the largest magnitude weights would be easy to implement. But e.g. the approach we ended up using in the subnetwork inference paper (i.e. selecting weights with the largest marginal variances, where we used SWAG for variance estimation) is a bit more involved and might make the codebase too bloated. But happy to discuss this.

Thanks a lot for your efforts in looking at this (there's no rush of course)!

laplace/__init__.py

laplace/subnetlaplace.py

aleximmer · 2021-12-02T12:10:02Z

1.-7. Minor comments

Looks quite nice already. I made two comments on the code directly. Here the answer/discussion to the more minor points:

Suggestions for tests: with all parameters retained, all should be equivalent to FullLaplace. With last layer retained, all should be equal to that as well.
The subnetwork should be static as you suggest because changing it really requires to rerun the entire .fit() method which is the only setup function that has a non negligible runtime.
It could make sense to reimplement the code separately for SubnetLaplace to make it more clear and potentially changeable in the future. Otherwise, one could add @property values for all class parameters that need to be masked and then override these in the Subnet class and mask correspondingly.
Yes, probably we could make the change and have Jacobians be an instance-method instead of static. I don't remember why it is like this right now. In fact, we have a similar change right now in the FunctionalLaplace PR: Functional laplace #55. If you can make the change to Jacobians in the backends on this branch, that would also work. I don't see any problem with this right now but there might be some..
I agree that diagonal or KFAC subnet don't make sense because KFAC would require a certain subnet structure to be sensible and the full diagonal is usually used to select the parameters of interest. I think diagonal last layer makes a little more sense but is probably also not something that should be used much..

8. Choice of subnetwork

One question regarding the choice of subnetwork in your paper: would it be possible to use the diagonal Laplace-GGN to choose the subnetwork? In fact, I thought that's what you did. Is there any reason that you (need to) use SWAG instead of Laplace to choose the subnetwork?

Because I thought that the subnetwork would be chosen using the diagonal GGN, I was thinking to just override the .fit() method with a separate run over the data_loader to select and set the subnetwork indices and then run the true .fit() computing the masked full GGN. If the method does not work with a diagonal GGN as selection criterion and we anyway also want to include different policies for selecting subnetworks, it might make sense to design a class that can be passed to the SubnetLaplace. One could implement it as follows

class SubnetMask(object):
    def __init__(self, model, likelihood, ..):
        ...

    def __call__(self, train_loader, ...):
        # select mask based on train loader, model and other hyperparams similar to a subclass of CurvatureInterface
        return mask

Then, different policies can subclass this mask, and one could potentially separately implement a SWAG-Mask, random-mask, last-layer mask (for testing?). Also, the method using diagonal GGN would be easy to implement. What do you think about this?

…ccordingly (incl. tests)

…, and last-layer subnet masks

edaxberger · 2021-12-10T15:19:39Z

I now addressed most of the remaining issues discussed. Most notably, I followed your suggestion of adding a SubnetMask class where subclasses implement different subnetwork selection strategies. For now, random, largest magnitude and last-layer subnetworks are supported. I am considering to also add the diag-Laplace subnet selection strategy you mentioned, which should indeed also be easy to implement. I am currently finalising the tests and will push those soon as well.

…lace)

edaxberger · 2021-12-10T15:56:03Z

I also finished implementing subnet selection based on the largest marginal variances (using diagonal Laplace for variance estimation).

aleximmer

Looks good, I just left some minor comments. I can help with them.

laplace/curvature/asdl.py

laplace/curvature/curvature.py

laplace/subnetlaplace.py

laplace/subnetmask.py

…into subnetlaplace

…del now (same for last_layer_jacobians)

… NN predictive)

edaxberger · 2022-01-03T18:20:22Z

I have now implemented the remaining things we had discussed, namely:

I moved all utility files (feature_extractor.py, matrix.py, subnetmask.py, swag.py, utils.py) into a new laplace/utils folder
I changed the logic of SubnetLaplace to take subnetwork_indices instead of a subclass of SubnetMask; this includes careful checks for the validity of the passed subnetwork_indices
I updated the tests accordingly (i.e. by adapting existing tests to the new structure and adding new tests for custom subnetwork_indices not derived from any SubnetMask subclass)

Let me know what you think and if there is anything else that I should take care of!

…entation

edaxberger · 2022-01-04T10:02:14Z

I also just added an example for using SubnetLaplace (with different ways to choose the subnetwork_indices) to the README.

aleximmer

Great work, looks good to me now. I just left a few comments/ questions. Other than that

It would be great if you could make sure that the linewidth does not exceed (90? 100?). Maybe we didn't agree on the linewidth, yet. I'd suggest to go with 100 for now.
Did you recompile the documentation as indicated in the readme? Otherwise I can do that if you want.

laplace/baselaplace.py

laplace/subnetlaplace.py

laplace/utils/subnetmask.py

aleximmer · 2022-01-11T14:21:09Z

laplace/utils/swag.py

+    return parameters_to_vector(model.parameters()).detach()
+
+
+def fit_diagonal_swag(model, train_loader, criterion, n_snapshots_total=40, snapshot_freq=1, lr=0.01, momentum=0.9, weight_decay=3e-4, min_var=1e-30):


Just out of interest: the mean is just discarded and not useful for the final LA or could it help to get a better mean following the idea of standard SWA? The function name indicates it does the full swag (mean + var) currently. If you just need the variance, maybe that can be indicated in the function name like fit_diagonal_swag_var. But it's not necessary, just a suggestion.

Good point! iirc I have tried using the SWA mean with Laplace, which indeed improved performance. But I haven't done extensive experiments with it as I thought it's not as principled (although perhaps one could view the SWA mean as some kind of approximation to the MAP?), but it would be interesting to revisit the idea (and perhaps even add this as a feature if it works well). The only downside I see is that it requires us to update the BatchNorm parameters of the model, which is one of the main reasons why SWA(G) is so slow.

I've changed the name to fit_diagonal_swag_var for clarity, thanks!

runame

Looks great! See my last comment for my only concern.

Minor notes:

Regarding the line length, I also think 100 is fine for now. I think we have always used brackets ([{ for line breaks, so it would be nice to be consistent with that.
We could consider also moving marglik_training.py to the new utils directory (or just leave it where it is or in the future add something like a methods directory).

laplace/subnetlaplace.py

laplace/utils/subnetmask.py

laplace/lllaplace.py

laplace/subnetlaplace.py

aleximmer · 2022-01-11T16:33:14Z

Regarding the line length, I also think 100 is fine for now. I think we have always used brackets ([{ for line breaks, so it would be nice to be consistent with that.

We could consider also moving marglik_training.py to the new utils directory (or just leave it where it is or in the future add something like a methods directory).

Agree with the first comment. I think in the PR are plenty of linebreaks backslash.
I wouldn't put it into utils. I think utils should only contain utilities that are required for the package and not utilities for users. The latter we can call methods later as discussed.

…racket

edaxberger · 2022-01-12T12:15:44Z

Thanks a lot for your comments, Alex and Runa, which I have now addressed! I also recompiled the documentation. Let me know if there is anything else that needs to be resolved.

edaxberger · 2022-01-12T12:18:49Z

Just realised that for some reason Travis has not been running the tests here anymore for the latest commits (I ran tests locally, where everything worked). Any idea why it's not doing this check anymore?

runame · 2022-01-12T12:20:30Z

Just realised that for some reason Travis has not been running the tests here anymore for the latest commits (I ran tests locally, where everything worked). Any idea why it's not doing this check anymore?

We have limited compute available on Travis, so we only run the tests when merging PRs from now on. If you ran the tests locally it should be fine.

edaxberger · 2022-01-12T12:31:07Z

We have limited compute available on Travis, so we only run the tests when merging PRs from now on. If you ran the tests locally it should be fine.

Ah I see, makes sense!

I just merged main into subnetlaplace and re-ran the tests locally, and everything still works. So feel free to merge this into main!

edaxberger added 3 commits November 30, 2021 09:24

Add support for subnetwork Laplace approximation

0b32640

Fix issues with subnetwork Laplace integration

88e806c

Remove notes to myself

768fa63

edaxberger added the enhancement New feature or request label Nov 30, 2021

edaxberger added this to the NeurIPS Prerelease milestone Nov 30, 2021

edaxberger requested review from aleximmer and runame November 30, 2021 11:11

edaxberger linked an issue Nov 30, 2021 that may be closed by this pull request

Subnetwork inference #16

Closed

aleximmer reviewed Dec 2, 2021

View reviewed changes

laplace/__init__.py Outdated Show resolved Hide resolved

laplace/subnetlaplace.py Show resolved Hide resolved

edaxberger added 4 commits December 10, 2021 07:44

Remove SubnetLaplace base class; only option remains FullSubnetLaplace

f8ab8ac

Make jacobians and last_layer_jacobians non-static and adapted code a…

57f46d2

…ccordingly (incl. tests)

Add SubnetMask baseclass and subclasses for random, largest magnitude…

2530122

…, and last-layer subnet masks

Adapt FullSubnetLaplace to use new SubnetMask class interface

c0be3f9

Add support for largest variance subnet selection (using diagonal Lap…

257d33b

…lace)

Merge branch 'main' into subnetlaplace

5d29dbf

aleximmer reviewed Dec 10, 2021

View reviewed changes

laplace/curvature/asdl.py Outdated Show resolved Hide resolved

laplace/curvature/curvature.py Outdated Show resolved Hide resolved

laplace/subnetlaplace.py Outdated Show resolved Hide resolved

laplace/subnetmask.py Outdated Show resolved Hide resolved

aleximmer and others added 11 commits December 10, 2021 17:16

Merge branch 'main' into subnetlaplace

91931c5

Remove change

3771e94

Change FullSubnetLaplace to SubnetLaplace as it's the only option

38cd0f6

Merge branch 'subnetlaplace' of https://github.com/AlexImmer/Laplace …

dddf746

…into subnetlaplace

Convert indentation from tabs to spaces

f933dad

Remove model as argument from jacobians() as it has access to self.mo…

ac4542c

…del now (same for last_layer_jacobians)

Minor fixes for SubnetLaplace

6a88c82

Add tests for SubnetLaplace and SubnetMasks

c47c7d0

Change indentation to spaces in test_subnetlaplace.py

da23af9

Implement sample() method for SubnetLaplace (as e.g. required for the…

ddf840c

… NN predictive)

Add tests for SubnetLaplace predictives

7a48489

edaxberger added 5 commits January 3, 2022 18:24

Remove None default value for subnetwork indices in SubnetLaplace

e66ba51

Add failing test case for scalar subnetwork indices

589b846

Change SubnetMask.select() to return subnet indices and improve docum…

be245c8

…entation

Minor refactorings (subnet indices checks and documentation)

6ae4f9f

Add README example for SubnetLaplace

288f767

aleximmer approved these changes Jan 11, 2022

View reviewed changes

runame reviewed Jan 11, 2022

View reviewed changes

laplace/subnetlaplace.py Outdated Show resolved Hide resolved

laplace/utils/subnetmask.py Outdated Show resolved Hide resolved

laplace/lllaplace.py Show resolved Hide resolved

laplace/subnetlaplace.py Show resolved Hide resolved

edaxberger added 11 commits January 12, 2022 09:24

Add __all__ for utils.py

b5d8adf

Add __all__ for swag.py

557aa05

Add __init__.py to utils/ to simplify utility imports

4ab30df

Simplify check for duplicate indices in Subnet Laplace

99d3fb2

Add __all__ for matrix.py

6186573

Add line breaks with proper indents for __all__ in subnetmask.py

86b91bc

Change name of fit_diagonal_swag() to fit_diagonal_swag_var()

3b54e8a

Shorten lines to 100 chars and change line breaks from backslash to b…

d5d2d23

…racket

Add call to _init_H() in the Subnet Laplace constructor

ddc15c5

Add test for instantiating Subnet Laplace with large model

0952ead

Update docs

83877b2

Merge remote-tracking branch 'origin/main' into subnetlaplace

b956356

runame approved these changes Jan 12, 2022

View reviewed changes

runame merged commit b0ddf5f into main Jan 12, 2022

runame deleted the subnetlaplace branch January 12, 2022 12:34

Phoveran mentioned this pull request Jan 27, 2022

Questions about Subnetwork #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subnetwork Laplace #58

Subnetwork Laplace #58

edaxberger commented Nov 30, 2021

aleximmer commented Dec 2, 2021

edaxberger commented Dec 10, 2021

edaxberger commented Dec 10, 2021

aleximmer left a comment

edaxberger commented Jan 3, 2022

edaxberger commented Jan 4, 2022

aleximmer left a comment

aleximmer Jan 11, 2022

edaxberger Jan 12, 2022

runame left a comment

aleximmer commented Jan 11, 2022

edaxberger commented Jan 12, 2022

edaxberger commented Jan 12, 2022

runame commented Jan 12, 2022

edaxberger commented Jan 12, 2022

		return parameters_to_vector(model.parameters()).detach()


		def fit_diagonal_swag(model, train_loader, criterion, n_snapshots_total=40, snapshot_freq=1, lr=0.01, momentum=0.9, weight_decay=3e-4, min_var=1e-30):

Subnetwork Laplace #58

Subnetwork Laplace #58

Conversation

edaxberger commented Nov 30, 2021

aleximmer commented Dec 2, 2021

1.-7. Minor comments

8. Choice of subnetwork

edaxberger commented Dec 10, 2021

edaxberger commented Dec 10, 2021

aleximmer left a comment

Choose a reason for hiding this comment

edaxberger commented Jan 3, 2022

edaxberger commented Jan 4, 2022

aleximmer left a comment

Choose a reason for hiding this comment

aleximmer Jan 11, 2022

Choose a reason for hiding this comment

edaxberger Jan 12, 2022

Choose a reason for hiding this comment

runame left a comment

Choose a reason for hiding this comment

aleximmer commented Jan 11, 2022

edaxberger commented Jan 12, 2022

edaxberger commented Jan 12, 2022

runame commented Jan 12, 2022

edaxberger commented Jan 12, 2022