Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel fails if usecols and index_cols parameters are provided #3305

Closed
amyskov opened this issue Aug 3, 2021 · 1 comment · Fixed by #5508
Closed

read_excel fails if usecols and index_cols parameters are provided #3305

amyskov opened this issue Aug 3, 2021 · 1 comment · Fixed by #5508
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas.io

Comments

@amyskov
Copy link
Contributor

amyskov commented Aug 3, 2021

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Modin version: 54898ef
  • Python version: 3.8.10
  • Code we can use to reproduce:
import pandas
import numpy as np
import os

import modin.pandas as pd
from modin.pandas.test.utils import df_equals
from modin.config import NPartitions
NPartitions.put(4)

filename = "test_excel.xlsx"
row_size = 256
kwargs = {
    "usecols": [0],
    "index_col": 0,
}


try:
    df = pandas.DataFrame(
        {"col1": np.arange(row_size), "col2": np.arange(row_size)}
    )
    df.to_excel(filename)

    df_pd = pd.read_excel(filename, **kwargs)
    df_pandas = pandas.read_excel(filename, **kwargs)

    df_equals(df_pandas, df_pd)

finally:
    os.remove(filename)

Describe the problem

Source code / logs

Traceback (most recent call last):
  File "test.py", line 824, in <module>
    df_pd = pd.read_excel(filename, **kwargs)
  File "/modin/modin/pandas/io.py", line 347, in read_excel
    intermediate = FactoryDispatcher.read_excel(**kwargs)
  File "/modin/modin/data_management/factories/dispatcher.py", line 202, in read_excel
    return cls.__factory._read_excel(**kwargs)
  File "/modin/modin/data_management/factories/factories.py", line 256, in _read_excel
    return cls.io_cls.read_excel(**kwargs)
  File "/modin/modin/engines/base/io/file_dispatcher.py", line 67, in read
    query_compiler = cls._read(*args, **kwargs)
  File "/modin/modin/engines/base/io/text/excel_dispatcher.py", line 246, in _read
    dtypes = pandas.Series(dtypes, index=column_names)
  File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/core/series.py", line 355, in __init__
    if is_empty_data(data) and dtype is None:
  File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/core/construction.py", line 794, in is_empty_data
    is_simple_empty = is_list_like_without_dtype and not data
  File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/core/generic.py", line 1534, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
@RehanSD
Copy link
Collaborator

RehanSD commented Oct 12, 2022

Confirmed that this bug is reproducible on master, although the error is different:

AssertionError                            Traceback (most recent call last)
Input In [1], in <cell line: 18>()
     24     df_pd = pd.read_excel(filename, **kwargs)
     25     df_pandas = pandas.read_excel(filename, **kwargs)
---> 27     df_equals(df_pandas, df_pd)
     29 finally:
     30     os.remove(filename)

File ~/Documents/modin/modin/pandas/test/utils.py:583, in df_equals(df1, df2)
    580     df2 = to_pandas(df2)
    582 if isinstance(df1, pandas.DataFrame) and isinstance(df2, pandas.DataFrame):
--> 583     assert_empty_frame_equal(df1, df2)
    585 if isinstance(df1, pandas.DataFrame) and isinstance(df2, pandas.DataFrame):
    586     assert_frame_equal(
    587         df1,
    588         df2,
   (...)
    593         check_categorical=False,
    594     )

File ~/Documents/modin/modin/pandas/test/utils.py:528, in assert_empty_frame_equal(df1, df2)
    513 """
    514 Test if df1 and df2 are empty.
    515
   (...)
    524     If check fails.
    525 """
    527 if (df1.empty and not df2.empty) or (df2.empty and not df1.empty):
--> 528     assert False, "One of the passed frames is empty, when other isn't"
    529 elif df1.empty and df2.empty and type(df1) != type(df2):
    530     assert False, f"Empty frames have different types: {type(df1)} != {type(df2)}"

AssertionError: One of the passed frames is empty, when other isn't

@mvashishtha mvashishtha added P2 Minor bugs or low-priority feature requests pandas.io labels Oct 12, 2022
anmyachev added a commit to anmyachev/modin that referenced this issue Dec 26, 2022
…rameters are provided

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
dchigarev pushed a commit that referenced this issue Jan 26, 2023
…s are provided (#5508)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas.io
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants