Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_lr_find does not work if there is a BackboneFinetuning callback #14674

Open
ejm714 opened this issue Sep 12, 2022 · 2 comments
Open

auto_lr_find does not work if there is a BackboneFinetuning callback #14674

ejm714 opened this issue Sep 12, 2022 · 2 comments
Labels
bug Something isn't working callback: finetuning help wanted Open to be worked on tuner

Comments

@ejm714
Copy link

ejm714 commented Sep 12, 2022

🐛 Bug

auto_lr_find does not properly restore the model for training if there is a BackboneFinetuning callback.

To Reproduce

Specify a BackboneFinetuning callback, set auto_lr_find to True, and then run tune and fit.

trainer = Trainer(
    auto_lr_find=True,
    callbacks=[BackboneFinetuning()],
)
trainer.tune(model, train_dataloaders=train_data, val_dataloaders=val_data)
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)

which will yield the following error

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/finetuning.py in on_fit_start(self, trainer, pl_module)
    103             for opt_idx, optimizer in enumerate(trainer.optimizers):
    104                 param_groups = self._apply_mapping_to_param_groups(
--> 105                     self._internal_optimizer_metadata[opt_idx], named_parameters
    106                 )
    107                 optimizer.param_groups = param_groups

KeyError: 0

See notebook example: https://colab.research.google.com/drive/1ajrSRge90RM8Rlcwk0HyEosLLpOpyvg-

Expected behavior

It should be the case that after auto_lr_find runs, the model is reset and the found learning rate is used.

Environment

See bottom cell of colab notebook.

Additional context

I think the culprit is that on_fit_start on BackboneFinetuning now calls the on_fit_start method of BaseFinetuning, which then thinks the model is being restarted from a checkpoint.

It looks like the bug got introduced in this PR: 07635d0#diff-ac96be7ba54bac4d7dc79ee012a211498fb97689e37026fe8a1b06a359079224R410

The fix will need to both support the finetuning callbacks when training is resumed as well as as support using auto lr find when there is a backbone finetuning callback on the model.

cc @akihironitta @Borda @rohitgr7

@ejm714 ejm714 added the needs triage Waiting to be triaged by maintainers label Sep 12, 2022
@ejm714 ejm714 changed the title auto_lr_find does not work is there is a BackboneFinetuning callback auto_lr_find does not work if there is a BackboneFinetuning callback Sep 12, 2022
@carmocca carmocca added this to the pl:1.7.x milestone Sep 15, 2022
@carmocca carmocca added bug Something isn't working tuner callback: finetuning and removed needs triage Waiting to be triaged by maintainers labels Sep 15, 2022
@carmocca carmocca modified the milestones: pl:1.7.x, v1.8.x Oct 13, 2022
@Borda Borda modified the milestones: v1.8.x, v1.9 Jan 6, 2023
@Borda Borda modified the milestones: v1.9, v1.9.x Jan 16, 2023
@awaelchli awaelchli added the help wanted Open to be worked on label Dec 31, 2023
@awaelchli awaelchli removed this from the v1.9.x milestone Dec 31, 2023
@granthamtaylor
Copy link

I came across the same issue 1.5 years later.

I cannot use both BaseFinetuning and LearningRateFinder at the same time, otherwise I get hit with KeyError: 0.

@patrontheo
Copy link

Same for me
@Borda @awaelchli Any plans about it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working callback: finetuning help wanted Open to be worked on tuner
Projects
None yet
Development

No branches or pull requests

6 participants