Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4022: Fixed empty data frame with index #4910

Merged
merged 1 commit into from
Sep 27, 2022

Conversation

AndreyPavlenko
Copy link
Collaborator

What do these changes do?

Put index into columns if there are no columns but only index.

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves dataframe.loc() doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci #4022
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #4910 (7c260ad) into master (45879a6) will decrease coverage by 15.07%.
The diff coverage is n/a.

❗ Current head 7c260ad differs from pull request most recent head fe3fde3. Consider uploading reports for the commit fe3fde3 to get more accurate results

@@             Coverage Diff             @@
##           master    #4910       +/-   ##
===========================================
- Coverage   84.34%   69.26%   -15.08%     
===========================================
  Files         267      267               
  Lines       19749    19746        -3     
===========================================
- Hits        16657    13678     -2979     
- Misses       3092     6068     +2976     
Impacted Files Coverage Δ
...odin/experimental/core/storage_formats/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...din/experimental/core/execution/native/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...erimental/core/storage_formats/omnisci/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
.../core/execution/native/implementations/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...tive/implementations/omnisci_on_native/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...e/implementations/omnisci_on_native/io/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...mentations/omnisci_on_native/dataframe/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...tations/omnisci_on_native/partitioning/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...mentations/omnisci_on_native/calcite_serializer.py 0.00% <0.00%> (-98.71%) ⬇️
...plementations/omnisci_on_native/calcite_builder.py 0.00% <0.00%> (-96.37%) ⬇️
... and 48 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -653,7 +653,7 @@ def __getitem__(self, key):
--------
pandas.DataFrame.loc
"""
if self.df.empty:
if len(self.df.index) == 0 and self.df.empty:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need these changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the frame has only index, the following lambda fails with KeyError.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem specific to OmniSci backend or to all backends? Probably, the problem is in backend implementation of empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I could find only one implementation of this method - https://github.com/modin-project/modin/blob/master/modin/pandas/dataframe.py#L325

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why making a condition for default to pandas more strict might resolve any issue. We should be able to always be able to default to pandas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Actually, the problem is not in default_to_pandas() but in loc[], which is called from lambda. Pandas returns an empty Series for df.loc[1], but modin with omnisci backend fails with KeyError: 'Column F_1 does not exist in schema'. This issue should be fixed instead.

ienkovich
ienkovich previously approved these changes Sep 20, 2022
Copy link
Collaborator

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Put index into columns if there are no columns but only index

Signed-off-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com>
@YarShev YarShev merged commit 027f92a into modin-project:master Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dataframe.loc() doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci
3 participants