Keep initialization of H for all-weights and last-layer separate #72

runame · 2021-12-15T10:40:25Z

This is an alternative to #71: it addresses a bug resulting from #62, first reported here (note: links to PR discussion in private repository). For the last-layer LA flavors the Hessian approximation H was first initialized for all-weights, which leads to out-of-memory errors for larger models.

The advantage of the previous fix is that it keeps the classes for all-weights and last-layer flavors more strictly separated, which might make it harder to introduce similar bugs in the future. However, many more classes are necessary. @aleximmer and me agreed that this additional complexity is probably not worth it.

Changes:

Add better tests for the initialization of all the Laplace classes, also with a large model (Wide ResNet 50-2), as suggested in Add tests with larger model architectures #69.
Fix the H initialization bug. The posterior_precision falls back to the prior before calling fit() for the first time for most cases. Exceptions: a last-layer flavor which doesn't get the last_layer_name passed as an argument and low-rank Laplace. For these two cases, H will be None. When trying to call posterior_precision in these cases, a descriptive error will be raised.
Remove redundant code + minor fixes.

edaxberger · 2021-12-15T13:08:54Z

Looks great, and much simpler than the other solution indeed! I also tested it with WILDS and it works well.

Is it a problem that for last-layer with no last_layer_name passed and low-rank, posterior_precision is not defined? Does this mean that doing continual learning would be more difficult with these flavours (I guess even if so, we wouldn't want people to use those for CL anyways)?

runame · 2021-12-15T14:22:26Z

Thanks for testing it with WILDS!

I don't think the two exceptions are a problem:

As you say, last-layer flavors should most likely not be used for continual learning anyway. Also, it is still possible to use them by passing the last_layer_name argument or simply a few more lines of code in the actual continual learning script.
Low-rank does not even support fitting repeatedly (without overriding H), hence it is not an option for continual learning anyway.

edaxberger · 2021-12-15T16:57:35Z

Yes, good points!

aleximmer · 2021-12-17T13:54:20Z

We could additionally prohibit trying to do CL with these classes by adjusting the fit method to have no override argument and default to override=True.

runame · 2021-12-17T15:22:59Z

I think that's also ok. I don't really have any use case in mind where one might want to use override=False with the last-layer flavors. And if that changes, we can easily enable the option again. Alternatively, we can raise an error like we currently do for low-rank Laplace, to avoid confusion of the user (in principle there is no reason why there should be no override argument for last-layer flavors).

runame · 2021-12-18T15:03:13Z

Now a descriptive error gets raised when override=False for low-rank or last-layer Laplace approximations.

edaxberger · 2021-12-20T08:41:52Z

Great, I agree that a descriptive error is more useful/clear than just not offering the option at all (and we might still add the feature at some point if we think it's useful at all). Happy to merge this in.

runame · 2021-12-20T09:05:26Z

I think @aleximmer wanted to take a closer look today. After that we can merge it.

aleximmer · 2021-12-21T09:22:35Z

lgtm

runame added 3 commits December 15, 2021 10:50

Add AsdlHessian to init

9538615

Improve init tests (also with WRN50-2)

f143278

Fix H init bug

53fa9b1

runame added the bug Something isn't working label Dec 15, 2021

runame added this to the NeurIPS Prerelease milestone Dec 15, 2021

runame requested review from aleximmer and edaxberger December 15, 2021 10:40

edaxberger approved these changes Dec 15, 2021

View reviewed changes

Raise error when override=False for last-layer LAs

5cc0384

aleximmer approved these changes Dec 21, 2021

View reviewed changes

runame merged commit 7e42de8 into main Dec 21, 2021

runame deleted the fix-H-init-alt branch December 21, 2021 09:27

edaxberger mentioned this pull request Dec 21, 2021

Subnetwork Laplace #58

Merged

runame mentioned this pull request Jan 13, 2022

Add tests with larger model architectures #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep initialization of H for all-weights and last-layer separate #72

Keep initialization of H for all-weights and last-layer separate #72

runame commented Dec 15, 2021

edaxberger commented Dec 15, 2021

runame commented Dec 15, 2021

edaxberger commented Dec 15, 2021

aleximmer commented Dec 17, 2021

runame commented Dec 17, 2021

runame commented Dec 18, 2021

edaxberger commented Dec 20, 2021

runame commented Dec 20, 2021

aleximmer commented Dec 21, 2021

Keep initialization of H for all-weights and last-layer separate #72

Keep initialization of H for all-weights and last-layer separate #72

Conversation

runame commented Dec 15, 2021

edaxberger commented Dec 15, 2021

runame commented Dec 15, 2021

edaxberger commented Dec 15, 2021

aleximmer commented Dec 17, 2021

runame commented Dec 17, 2021

runame commented Dec 18, 2021

edaxberger commented Dec 20, 2021

runame commented Dec 20, 2021

aleximmer commented Dec 21, 2021