-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REFACTOR-#1901: Improve performance of groupby.mean
#1902
Conversation
Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
6c3eec1
to
a600f42
Compare
groupby.mean
groupby.mean
Codecov Report
@@ Coverage Diff @@
## master #1902 +/- ##
===========================================
- Coverage 81.52% 67.63% -13.89%
===========================================
Files 79 79
Lines 9178 9179 +1
===========================================
- Hits 7482 6208 -1274
- Misses 1696 2971 +1275
Continue to review full report at Codecov.
|
Local testing shows that the following is the fastest approach. df.groupby(by).sum() / df.groupby(by).count() # 2 groupby and 1 join This is strange because it is faster than a single groupby on local testing data. More understanding is required to see what the implications are for data of different shapes. |
@devin-petersohn, should this PR be closed since we have #3586? |
We can close this |
Signed-off-by: Devin Petersohn devin.petersohn@gmail.com
What do these changes do?
flake8 modin
black --check modin
git commit -s