Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv with Ray engine fails with dialect parameter #2508

Closed
amyskov opened this issue Dec 3, 2020 · 1 comment · Fixed by #5512
Closed

read_csv with Ray engine fails with dialect parameter #2508

amyskov opened this issue Dec 3, 2020 · 1 comment · Fixed by #5512
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@amyskov
Copy link
Contributor

amyskov commented Dec 3, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Modin version: ce2bea8
  • Python version: 3.8.6
  • Code we can use to reproduce:
import os

os.environ["MODIN_ENGINE"] = "ray"

import pandas
import modin.pandas as pd
import csv
from modin.pandas.test.utils import df_equals

test_filename = "test.csv"
dialect_name = "test_dialect"


test_csv_dialect_params = {"delimiter": "|"}
csv.register_dialect(dialect_name, **test_csv_dialect_params)
dialect = csv.get_dialect(dialect_name)

kwargs = {
    "filepath_or_buffer": test_filename,
    "dialect": dialect,
}


test_data = """col1,col2,
1,2
3,4
"""

try:
    with open(test_filename, "w") as f:
        f.write(test_data)

    df_pandas = pandas.read_csv(**kwargs)
    print("pandas.read_csv output:\n", df_pandas)
    df_pd = pd.read_csv(**kwargs)
    print("pd.read_csv output:\n", df_pd)
    df_equals(df_pandas, df_pd)
finally:
    os.remove(test_filename)

Describe the problem

Source code / logs

pandas.read_csv output:
   col1,col2,
0        1,2
1        3,4
Traceback (most recent call last):
  File "test.py", line 242, in <module>
    df_pd = pd.read_csv(**kwargs)
  File "/modin/modin/pandas/io.py", line 109, in parser_func
    return _read(**kwargs)
  File "/modin/modin/pandas/io.py", line 127, in _read
    pd_obj = EngineDispatcher.read_csv(**kwargs)
  File "/modin/modin/data_management/factories/dispatcher.py", line 104, in read_csv
    return cls.__engine._read_csv(**kwargs)
  File "/modin/modin/data_management/factories/factories.py", line 87, in _read_csv
    return cls.io_cls.read_csv(**kwargs)
  File "/modin/modin/engines/base/io/file_dispatcher.py", line 29, in read
    query_compiler = cls._read(*args, **kwargs)
  File "/modin/modin/engines/base/io/text/csv_dispatcher.py", line 168, in _read
    partition_id = cls.deploy(cls.parse, num_splits + 2, args)
  File "/modin/modin/engines/ray/task_wrapper.py", line 25, in deploy
    return deploy_ray_func._remote(args=(func, kwargs), num_returns=num_returns)
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/remote_function.py", line 276, in _remote
    return invocation(args, kwargs)
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/remote_function.py", line 262, in invocation
    object_refs = worker.core_worker.submit_task(
  File "python/ray/_raylet.pyx", line 1019, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 1025, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 293, in ray._raylet.prepare_args
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/serialization.py", line 404, in serialize
    return self._serialize_to_msgpack(value)
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/serialization.py", line 384, in _serialize_to_msgpack
    self._serialize_to_pickle5(metadata, python_objects)
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/serialization.py", line 344, in _serialize_to_pickle5
    raise e
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/serialization.py", line 340, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 70, in dumps
    cp.dump(obj)
  File "/miniconda3/envs/modin_new/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 656, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_csv.Dialect' object
@amyskov amyskov added the bug 🦗 Something isn't working label Dec 3, 2020
@pyrito
Copy link
Collaborator

pyrito commented Aug 22, 2022

I am seeing a similar error on the latest master:

TypeError: Could not serialize the argument {'callback': <function CSVDispatcher.read_callback at 0x1bb614310>, 'sep': <no_default>, 'delimiter': None, 'header': 'infer', 'names': <no_default>, 'index_col': None, 'usecols': None, 'prefix': <no_default>, 'mangle_dupe_cols': True, 'dtype': None, 'engine': None, 'converters': None, 'true_values': None, 'false_values': None, 'skipinitialspace': False, 'skiprows': None, 'nrows': None, 'na_values': None, 'keep_default_na': True, 'na_filter': True, 'verbose': False, 'skip_blank_lines': True, 'parse_dates': None, 'infer_datetime_format': False, 'keep_date_col': False, 'date_parser': None, 'dayfirst': False, 'cache_dates': True, 'iterator': False, 'chunksize': None, 'compression': None, 'thousands': None, 'decimal': '.', 'lineterminator': None, 'quotechar': '"', 'quoting': 0, 'escapechar': None, 'comment': None, 'encoding': None, 'encoding_errors': 'strict', 'dialect': <_csv.Dialect object at 0x13497e090>, 'error_bad_lines': None, 'warn_bad_lines': None, 'on_bad_lines': None, 'skipfooter': 0, 'doublequote': True, 'delim_whitespace': False, 'low_memory': True, 'memory_map': False, 'float_precision': None, 'storage_options': None, 'fname': '/Users/kvelayutham/Downloads/data/test.csv', 'num_splits': 1, 'header_size': 1, 'start': 11, 'end': 19} for a task or actor modin.core.execution.ray.common.task_wrapper._deploy_ray_func. Check https://docs.ray.io/en/master/serialization.html#troubleshooting for more information.

@pyrito pyrito added pandas concordance 🐼 Functionality that does not match pandas P2 Minor bugs or low-priority feature requests labels Aug 22, 2022
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 3, 2023
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
YarShev pushed a commit that referenced this issue Jan 4, 2023
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
vnlitvinov pushed a commit that referenced this issue Jan 24, 2023
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants