-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#4023: Fall back to pandas in case of MultiIndex columns #5149
Conversation
modin/experimental/core/execution/native/implementations/hdk_on_native/dataframe/dataframe.py
Outdated
Show resolved
Hide resolved
@pytest.mark.parametrize( | ||
"column_names", [None, ["level1", None], ["level1", "level2"]] | ||
) | ||
def test_reset_index_multicolumns(self, is_multiindex, column_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this was removed? why can't we test multi-index columns for .reset_index
and .renme
at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, in case of MultiIndex columns, it falls back to Pandas. The test fails, in this mode, due to the forced lazy execution. MultiIndex renaming is not supported for columns, but it's supported for rows. I think, it makes more sense to test this functionality, rather than default pandas functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously this test used to pass successfully despite the columns being MultiIndex? I wonder then whether it's a proper solution to mark the whole frame as has_unsupported_data=True
if it's only a thing for the .rename()
and maybe some other subset of functions? Shouldn't we only mark .rename()
and those other methods that can't handle MultiIndex columns as falling back to pandas and leave the others untouched since they can process this with no problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test passed just because it does nothing with columns. In fact, a frame with MultiIndex columns is totally broken due to column names conversion to HDK-supported format. It's not even possible to print such a frame. If we mark only subset of operations, as you suggested, there could be a situation, when some supported operations applied and when an unsupported operation is called, it will not be able to fall back to pandas, because to_pandas() fails for such frames.
After the fix, this test failed due to forced lazy execution. I could set force_lazy=False in the test, but, I think, it does not make any sense. If we have MultiIndex columns, we are testing pandas, but not modin. That's why I've removed the multiindex columns.
modin/experimental/core/execution/native/implementations/hdk_on_native/dataframe/dataframe.py
Outdated
Show resolved
Hide resolved
…umns MultiIndex columns are not currently supported by HDK. Signed-off-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com>
894cbbd
to
2bffd23
Compare
MultiIndex columns are not currently supported by HDK.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
dataframe.rename
doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci #4023?docs/development/architecture.rst
is up-to-date