Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4023: Fall back to pandas in case of MultiIndex columns #5149

Merged
merged 1 commit into from
Oct 31, 2022

Conversation

AndreyPavlenko
Copy link
Collaborator

MultiIndex columns are not currently supported by HDK.

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves dataframe.rename doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci #4023?
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@AndreyPavlenko AndreyPavlenko marked this pull request as ready for review October 21, 2022 18:46
@AndreyPavlenko AndreyPavlenko requested a review from a team as a code owner October 21, 2022 18:46
Comment on lines -450 to -453
@pytest.mark.parametrize(
"column_names", [None, ["level1", None], ["level1", "level2"]]
)
def test_reset_index_multicolumns(self, is_multiindex, column_names):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this was removed? why can't we test multi-index columns for .reset_index and .renme at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, in case of MultiIndex columns, it falls back to Pandas. The test fails, in this mode, due to the forced lazy execution. MultiIndex renaming is not supported for columns, but it's supported for rows. I think, it makes more sense to test this functionality, rather than default pandas functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously this test used to pass successfully despite the columns being MultiIndex? I wonder then whether it's a proper solution to mark the whole frame as has_unsupported_data=True if it's only a thing for the .rename() and maybe some other subset of functions? Shouldn't we only mark .rename() and those other methods that can't handle MultiIndex columns as falling back to pandas and leave the others untouched since they can process this with no problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passed just because it does nothing with columns. In fact, a frame with MultiIndex columns is totally broken due to column names conversion to HDK-supported format. It's not even possible to print such a frame. If we mark only subset of operations, as you suggested, there could be a situation, when some supported operations applied and when an unsupported operation is called, it will not be able to fall back to pandas, because to_pandas() fails for such frames.

After the fix, this test failed due to forced lazy execution. I could set force_lazy=False in the test, but, I think, it does not make any sense. If we have MultiIndex columns, we are testing pandas, but not modin. That's why I've removed the multiindex columns.

…umns

MultiIndex columns are not currently supported by HDK.

Signed-off-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com>
@YarShev YarShev merged commit 6f0ff79 into modin-project:master Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dataframe.rename doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci
3 participants