Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: TypeError: cannot pickle 'module' object #6022

Closed
3 tasks done
pingsutw opened this issue Apr 19, 2023 · 5 comments · Fixed by #6023
Closed
3 tasks done

BUG: TypeError: cannot pickle 'module' object #6022

pingsutw opened this issue Apr 19, 2023 · 5 comments · Fixed by #6023
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin Integration ➕➕ Issues with integrating Modin into other libraries

Comments

@pingsutw
Copy link

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import os
import lazy_import
import ray
pandas = lazy_import.lazy_module("pandas")
pyarrow = lazy_import.lazy_module("pyarrow")
from modin import pandas as pd

os.environ["MODIN_ENGINE"] = "ray"
if not ray.is_initialized():
    ray.init(_plasma_directory="/tmp")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": list("abc")})

Issue Description

Failed to create dataframe if using lazy_import in the file

Expected Behavior

Should be able to create dataframe normally

Error Logs

    return cls.get_factory()._from_non_pandas(*args, **kwargs)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 113, in get_factory
    Engine.subscribe(cls._update_factory)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/config/pubsub.py", line 217, in subscribe
    callback(cls)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 155, in _update_factory
    cls.__factory.prepare()
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/dispatching/factories/factories.py", line 441, in prepare
    from modin.core.execution.ray.implementations.pandas_on_ray.io import (
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/__init__.py", line 16, in <module>
    from .io import PandasOnRayIO
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/io.py", line 43, in <module>
    from ..dataframe import PandasOnRayDataframe
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/__init__.py", line 16, in <module>
    from .dataframe import PandasOnRayDataframe
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/dataframe.py", line 16, in <module>
    from ..partitioning.partition_manager import PandasOnRayDataframePartitionManager
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/__init__.py", line 17, in <module>
    from .partition_manager import PandasOnRayDataframePartitionManager
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.py", line 25, in <module>
    from .virtual_partition import (
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/virtual_partition.py", line 28, in <module>
    _DEPLOY_AXIS_FUNC = ray.put(PandasDataframeAxisPartition.deploy_axis_func)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/worker.py", line 2375, in put
    object_ref = worker.put_object(value, owner_address=serialize_owner_address)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/worker.py", line 611, in put_object
    serialized_value = self.get_serialization_context().serialize(value)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/serialization.py", line 450, in serialize
    return self._serialize_to_msgpack(value)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/serialization.py", line 428, in _serialize_to_msgpack
    pickle5_serialized_object = self._serialize_to_pickle5(
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/serialization.py", line 390, in _serialize_to_pickle5
    raise e
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/_private/serialization.py", line 385, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/Users/kevin/opt/anaconda3/envs/flytekit-3.9/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 627, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'module' object

Installed Versions

(flytekit-3.9) ➜ ~ pip list | grep modin
modin 0.18.1

@pingsutw pingsutw added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Apr 19, 2023
@anmyachev
Copy link
Collaborator

@pingsutw thanks for the report! I was able to reproduce the issue.

@anmyachev anmyachev added Integration ➕➕ Issues with integrating Modin into other libraries and removed Triage 🩹 Issues that need triage labels Apr 19, 2023
anmyachev added a commit to anmyachev/modin that referenced this issue Apr 19, 2023
…', '_DRAIN' into Ray virtual partition

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Apr 19, 2023
…', '_DRAIN' into Ray virtual partition

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev
Copy link
Collaborator

@pingsutw for this example, it will be enough to make the following changes #6023. If there are no unforeseen problems, then these changes will be in the master branch within a few days.

anmyachev added a commit to anmyachev/modin that referenced this issue Apr 19, 2023
…', '_DRAIN' into Ray virtual partition

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@pingsutw
Copy link
Author

@anmyachev Thanks for quick fix!

@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
dchigarev pushed a commit that referenced this issue Apr 21, 2023
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit that referenced this issue Apr 24, 2023
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@pingsutw
Copy link
Author

@anmyachev we have a test in flytekit, it will use lazy_import before creating modin dataframe. this test still failed after upgrading modin. However, if I run the same code without pytest, it's working fine. Do you know why? Thanks in advance.

@pingsutw
Copy link
Author

nvm, we fixed the issue by using importlib.util.LazyLoader instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin Integration ➕➕ Issues with integrating Modin into other libraries
Projects
None yet
2 participants