-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7254: Support right merge/join #7226
Conversation
1f2e4ca
to
79ce728
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
88c9787
to
b380dfd
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
…al reset_index for left merge" Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
9ee3d4b
to
089bcce
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
…rge-swap-operand
@YarShev ready for review |
@@ -140,7 +141,11 @@ def test_join(test_data, test_data2): | |||
lsuffix="_caller", | |||
rsuffix="_other", | |||
) | |||
df_equals(modin_result, pandas_result) | |||
if sorts[j]: | |||
# sorting in `join` is implemented through range partitioning technique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this comment. sorting is just called in df_equals_and_sort. Should we remove this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded the comment a little
modin/experimental/core/execution/native/implementations/hdk_on_native/dataframe/dataframe.py
Outdated
Show resolved
Hide resolved
def map_func(left, right, kwargs=kwargs): # pragma: no cover | ||
return pandas.DataFrame.join(left, right, **kwargs) | ||
if how in ["left", "inner"] or ( | ||
how == "right" and right._modin_frame._partitions.size != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if left has size equals to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty dataframes are processed at a higher level (simply defaulted to pandas) and all implementation logic relies on the fact that the left operand has a non-empty set of partitions. Therefore, to avoid an error, we need to handle this situation as before.
if ( | ||
( | ||
how in ["left", "inner"] | ||
or (how == "right" and right._modin_frame._partitions.size != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
4eff41c
to
981c74e
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
data = np.random.randint(0, 100, size=(64, 64)) | ||
eval_general( | ||
*create_test_dfs(data), | ||
lambda df: df.join(df.iloc[:0], on=1, how=how, lsuffix="_caller"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not use on
parameter for a similar test with merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different threads of execution, for merge
function we can omit this parameter so that a certain part of the code starts executing.
Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>
What do these changes do?
Blocked on #7251Performance: 1.7 sec (the PR) vs 3.1 sec (on main) on Ray (8 cores)
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date