Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use torch.jit.trace to trace LightningModule in Lightning v1.7 #14036

Closed
J-shang opened this issue Aug 5, 2022 · 18 comments
Closed

Cannot use torch.jit.trace to trace LightningModule in Lightning v1.7 #14036

J-shang opened this issue Aug 5, 2022 · 18 comments
Assignees
Labels
bug Something isn't working lightningmodule pl.LightningModule

Comments

@J-shang
Copy link

J-shang commented Aug 5, 2022

🐛 Bug

When I use torch.jit.trace to trace a LightningModule,
I got RuntimeError: XXX(LightningModule class name) is not attached to a Trainer.

This because in lightning 1.7.0, property trainer will raise an RuntimeError if the module doesn't attach a Trainer.

https://github.com/Lightning-AI/lightning/blob/12a061f2aaefaa9ed9ccf81ab6f378835b675a7e/src/pytorch_lightning/core/module.py#L179

but in torch.jit, it will justify each attr by hasattr,

https://github.com/pytorch/pytorch/blob/de0e03001d31523ef86c3d7852c87cdad6d96632/torch/_jit_internal.py#L749

and in hasattr docstring, Return whether the object has an attribute with the given name. This is done by calling getattr(obj, name) and catching AttributeError.

To Reproduce

Initialize any LightningModule under Lightning v1.7.0, and trace it by torch.jit.trace without attach trainer to the lightning module.

To Fix

Replace RuntimeError by AttributeError.

This fix is work for me, but I don't know will this cause other problems.

Environment

  • Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): LightningModule
  • PyTorch Lightning Version (e.g., 1.5.0): 1.7.0
  • PyTorch Version (e.g., 1.10): 1.10
  • Python version (e.g., 3.9): 3.7.12

cc @carmocca @justusschock @awaelchli @Borda @ananthsub @ninginthecloud @jjenniferdai @rohitgr7

@J-shang J-shang added the needs triage Waiting to be triaged by maintainers label Aug 5, 2022
@J-shang J-shang changed the title Cannot use torch.jit.trace to trace LightningModule in Lightning v1.7 Cannot use torch.jit.trace to trace LightningModule in Lightning v1.7 Aug 5, 2022
@carmocca carmocca added bug Something isn't working lightningmodule pl.LightningModule and removed needs triage Waiting to be triaged by maintainers labels Aug 5, 2022
@carmocca carmocca added this to the pl:1.7.x milestone Aug 5, 2022
@carmocca carmocca self-assigned this Aug 5, 2022
@carmocca
Copy link
Contributor

carmocca commented Aug 5, 2022

Hi! Unfortunately, this is caused by a bug in PyTorch where properties are not correctly ignored: pytorch/pytorch#67146

As a workaround, you can use model.to_torchscript(method="trace")

@carmocca carmocca removed this from the pl:1.7.x milestone Aug 5, 2022
@J-shang
Copy link
Author

J-shang commented Aug 9, 2022

Hi! Unfortunately, this is caused by a bug in PyTorch where properties are not correctly ignored: pytorch/pytorch#67146

As a workaround, you can use model.to_torchscript(method="trace")

Thank you for your reply, model.to_torchscript(method="trace") works for me.

@Animesh081005
Copy link

Hi @carmocca, I am getting the same error when using torch.jit.trace for tracing a LightningModule. Unfortunately, I cannot use the above workarounds, as torch.jit.trace is being internally called by a library I am using with LightningModule. So do you have any suggestions to make torch.jit.trace work with a LightningModule?

FYI, I am using pytorch-lightning version 1.7.3

@rohitgr7
Copy link
Contributor

rohitgr7 commented Sep 1, 2022

@Animesh081005 can you open an issue with a reproducible script?

@kyoungrok0517
Copy link

kyoungrok0517 commented Sep 26, 2022

UPDATE: Solved in 1.7.7

The same happens with me.

@Erland366
Copy link

I am still having this issue. I am using Lightning Flash ObjectDetector with YOLOv5 backbone and neither script or trace the model works. The error says

ModelAdapter is not attached to a Trainer.

@Stack-Attack
Copy link

Stack-Attack commented Nov 24, 2022

I am also still seeing this error. On the most recent versions.
pytorch-lightning 1.8.3.post0
pytorch 1.13.0

@carmocca
Copy link
Contributor

Can you share the full error stacktrace?

@kishcs
Copy link

kishcs commented Dec 16, 2022

I am also having same issue with pytorch_lightning version 1.8.4.post0
I am trying to convert parseq torch model (https://github.com/baudm/parseq/releases/download/v1.0.0/parseq-bb5792a6.pt) to trt_ts with sample code given here (https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html).

Error:

Traceback (most recent call last):
  File "parseq_to_trt.py", line 18, in <module>
    trt_ts_module = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions)
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 124, in compile
    ts_mod = torch.jit.script(module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_script.py", line 1286, in script
    return torch.jit._recursive.create_script_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 473, in create_script_module
    concrete_type = get_module_concrete_type(nn_module, share_types)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 424, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 365, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 273, in infer_concrete_type_builder
    overloads.update(get_overload_name_mapping(get_overload_annotations(nn_module, ignored_properties)))
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 639, in get_overload_annotations
    item = getattr(mod, name, None)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/module.py", line 179, in trainer
    raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
RuntimeError: PARSeq is not attached to a `Trainer`.

@carmocca
Copy link
Contributor

You will need to torchscript the model first model.to_torchscript() and then pass that to torch_tensorrt.compile()

@naveenkumarkr723
Copy link

naveenkumarkr723 commented Dec 21, 2022

hi @carmocca
i converted the paseq model to to_torchscript() , now can u provide me the code how to convert TRT by using this link (https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html).

Duplicated in #16157

@laclouis5
Copy link

This issue is still not solved. Is a fix to be expected soon?

@carmocca
Copy link
Contributor

@laclouis5 No, and most likely never since PyTorch no longer works on TorchScript since the release of torch.compile

@hjp709394
Copy link

hjp709394 commented Sep 4, 2023

can we just create a dummy object and attach it to the lightningmodule? @carmocca

@eval-dev
Copy link

eval-dev commented Dec 5, 2023

My workaround is give it a dummy trainer model._trainer = pl.Trainer()

@Apolloxyy
Copy link

I am still having this issue. I am using Lightning Flash ObjectDetector with YOLOv5 backbone and neither script or trace the model works. The error says

ModelAdapter is not attached to a Trainer.

Hi, have you solved this problem? I encounter the same problem when I tried to load a lightning model.

@Apolloxyy
Copy link

I am also having same issue with pytorch_lightning version 1.8.4.post0 I am trying to convert parseq torch model (https://github.com/baudm/parseq/releases/download/v1.0.0/parseq-bb5792a6.pt) to trt_ts with sample code given here (https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html).

Error:

Traceback (most recent call last):
  File "parseq_to_trt.py", line 18, in <module>
    trt_ts_module = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions)
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 124, in compile
    ts_mod = torch.jit.script(module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_script.py", line 1286, in script
    return torch.jit._recursive.create_script_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 473, in create_script_module
    concrete_type = get_module_concrete_type(nn_module, share_types)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 424, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 365, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 273, in infer_concrete_type_builder
    overloads.update(get_overload_name_mapping(get_overload_annotations(nn_module, ignored_properties)))
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 639, in get_overload_annotations
    item = getattr(mod, name, None)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/module.py", line 179, in trainer
    raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
RuntimeError: PARSeq is not attached to a `Trainer`.

Hi, have you solve this problem? I encountered the same problem when I saved the trained lightning model.

@williamgao07
Copy link

add a dummy trainer addressed my issue. According to this post #17517 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lightningmodule pl.LightningModule
Projects
None yet
Development

No branches or pull requests