-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#6632: Return Series instead of Dataframe for groupby.apply in case of experimental groupby #6649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@dchigarev any comments? |
Well, the only thing to notice is that it now always triggers the computation after groupby (because of the Previously we had a trend of getting rid of such explicit materialization, and now we're adding them back again. The changes are introduced to the general groupby method If we decide to keep the fix, maybe we should at least move it to the script to measureimport modin.pandas as pd
from asv_bench.benchmarks.utils.common import execute
import modin.config as cfg
import numpy as np
from timeit import default_timer as timer
# initialize workers
pd.DataFrame([cfg.NPartitions.get() * cfg.MinPartitionSize.get()]).to_numpy()
df = pd.DataFrame({"by_col": np.tile(np.arange(10_000), 100), "a": np.arange(1_000_000), "b": np.arange(1_000_000), "c": np.arange(1_000_000)})
execute(df) # trigger import
results = []
t1 = timer()
for _ in range(10):
results.append(df.groupby("by_col").sum())
[execute(df) for df in results]
print(timer() - t1)
t1 = timer()
res = df.groupby("by_col").sum()
print(timer() - t1) |
@dchigarev Good catch! I was surprised that even
At least yes, you are right. |
you mean here - https://github.com/modin-project/modin/blob/master/modin/pandas/groupby.py#L657? |
yes |
Signed-off-by: izamyati <igor.zamyatin@intel.com>
Okay, pls look at another attempt |
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
…y.apply in case of experimental groupby (modin-project#6649) Signed-off-by: izamyati <igor.zamyatin@intel.com> Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
__reduced__
#6632docs/development/architecture.rst
is up-to-date