Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperability between DataFrames using different query compilers #7308

Closed
YarShev opened this issue Jun 11, 2024 · 6 comments · Fixed by #7376
Closed

Interoperability between DataFrames using different query compilers #7308

YarShev opened this issue Jun 11, 2024 · 6 comments · Fixed by #7376
Labels
new feature/request 💬 Requests and pull requests for new features

Comments

@YarShev
Copy link
Collaborator

YarShev commented Jun 11, 2024

Originally posted by @arunjose696 in #7259.

With the introduction of the small query compiler, we need to test the interoperability between DataFrames using different query compilers. For example, performing a binary operation between a DataFrame with the small query compiler and another with the Pandas query compiler. (Note: This feature is not yet included in this PR.)

This will require modifying or adding new tests. In the current tests in the modin/modin/tests/pandas/dataframe folder, we have the following scenarios where two DataFrames interact:

1)Derived DataFrames: In tests where the second DataFrame is created or derived from the first, egtest_join_empty, we need to refactor these tests so that the second DataFrame is created separately from the first and with MODIN_NATIVE_DATAFRAME_MODE set.

2)Lambda Functions: In tests where the other DataFrame is created within a lambda function, eg test___divmod__, we need to refactor these tests to either create the second DataFrame in the test definition itself or provide an additional wrapper for the lambda functions to ensure the DataFrame is created with a different query compilers.

3)Separate DataFrames: In tests where two separate DataFrames are used, eg test_where, we need to refactor these tests to include flipping the MODIN_NATIVE_DATAFRAME_MODE to None and Native_pandas when creating both the first and second DataFrame. This ensures that both the left and right operands are tested with different query compilers for interoperability. This flipping would also be required in cases mentioned in 1 and 2 after dataframes are separated.

Upon reviewing the modin/modin/tests/pandas/dataframe folder, I found approximately 100 tests involving scenarios where two DataFrames interact. These tests may need refactoring or copying to a different directory and updating to specifically test interoperability.

@YarShev @anmyachev @devin-petersohn, could you please provide suggestions on how to approach testing the interoperability?

@YarShev YarShev added the new feature/request 💬 Requests and pull requests for new features label Jun 11, 2024
@YarShev
Copy link
Collaborator Author

YarShev commented Jun 11, 2024

@arunjose696, thanks for your research. I think we should copy those tests to a different directory (e.g., modin/tests/pandas/native_df_mode) and update them to specifically test interoperability. This way, we would not bloat up existing tests and would make navigation for interoperability tests easier.

@devin-petersohn
Copy link
Collaborator

In terms of testing, I think unit tests are the way to go. We don't need to test every combination of APIs, as long as the conversion is working properly. We can add some canary testing on one or two APIs to ensure that end to end is working properly. Does this make sense?

@YarShev
Copy link
Collaborator Author

YarShev commented Jun 11, 2024

@devin-petersohn, thanks for the suggestion! It does make sense. We can start with unit tests to verify if API works with a single set of parameters.

@arunjose696
Copy link
Collaborator

After the first implementation of small QC is done, I will open a PR with interoperablilty and have unit tests to verify if the API works with single set of parameters.

For the first implementation #7259., would it suffice to go with tests in modin/modin/tests/pandas/dataframe folder for now by setting the MODIN_NATIVE_DATAFRAME_MODE, to verify the query compiler works for dataframes, or should we add unit tests even for the initial implementation?

@YarShev
Copy link
Collaborator Author

YarShev commented Jun 12, 2024

It seems to me we could have unit tests even for the first impl. Just copy tests from the dataframe folder to another one and leave a single set of parameters for every tests. @devin-petersohn, what do you think?

@devin-petersohn
Copy link
Collaborator

That makes sense to me. Thanks @YarShev and @arunjose696 !

YarShev added a commit that referenced this issue Sep 2, 2024
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature/request 💬 Requests and pull requests for new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants