You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When I train my AE on a training server and then load it on a production server, I encounter an error while trying to use the embed function. However, the same function works without issues on the training server.
To Reproduce
Steps to reproduce the behavior:
Train the AE on the training server.
Load the trained model on the production server.
Execute autoencoder_model.embed(someTensorX).
See error
`Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/pythae/models/base/base_model.py", line 129, in embed
return self(DatasetOutput(data=inputs)).z
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pythae/models/ae/ae_model.py", line 76, in forward
z = self.encoder(x).embedding
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "LOCAL_PATH_OF_TRAIN_SERVER_PYTHON_FILE_LOADING_PYTHAE_MODEL", line 40, in forward
TypeError: 'c' not supported between instances of 'NoneType' and 'NoneType'`
Expected behavior
I expect the model to embed the tensor without any errors, irrespective of the server it's being executed on.
Desktop Prod Server:
OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
Kernel version:
5.15.0-76-generic
Desktop Train Server:
OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS (beaver-osp1-gendry X45)
Release: 18.04
Codename: bionic
Kernel version:
5.4.0-150-generic
Additional context
The error seems to be related to the path of the Python file on the training server, as indicated in the traceback. It appears that the training environment's path is somehow hardcoded into the model when it's saved, which might be causing the issue when trying to load the model in a different environment.
The text was updated successfully, but these errors were encountered:
Thanks for mentioning this issue. It is a weird bug. Can you share your python environments on the training server and the production one (pip freeze) ? In particular, do you have the same version of Python on both servers?
Describe the bug
When I train my AE on a training server and then load it on a production server, I encounter an error while trying to use the embed function. However, the same function works without issues on the training server.
To Reproduce
Steps to reproduce the behavior:
Train the AE on the training server.
Load the trained model on the production server.
Execute autoencoder_model.embed(someTensorX).
`Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/pythae/models/base/base_model.py", line 129, in embed
return self(DatasetOutput(data=inputs)).z
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pythae/models/ae/ae_model.py", line 76, in forward
z = self.encoder(x).embedding
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "LOCAL_PATH_OF_TRAIN_SERVER_PYTHON_FILE_LOADING_PYTHAE_MODEL", line 40, in forward
TypeError: 'c' not supported between instances of 'NoneType' and 'NoneType'`
Expected behavior
I expect the model to embed the tensor without any errors, irrespective of the server it's being executed on.
Desktop Prod Server:
OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
Kernel version:
5.15.0-76-generic
Desktop Train Server:
OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS (beaver-osp1-gendry X45)
Release: 18.04
Codename: bionic
Kernel version:
5.4.0-150-generic
Additional context
The error seems to be related to the path of the Python file on the training server, as indicated in the traceback. It appears that the training environment's path is somehow hardcoded into the model when it's saved, which might be causing the issue when trying to load the model in a different environment.
The text was updated successfully, but these errors were encountered: