-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Calling df.agg([function]) is much slower than df.agg(function) when there are many columns and few rows. #45658
Comments
Please try on newer pandas, e.g. 1.4 |
Updated example using pandas 1.4 |
Thanks for the report! One may think of The trouble is that because However we run into more trouble because the code paths for This performance would be improved by #45557, when using the experimental option. |
Now that the deprecation @rhshadrach referred to is enforced, it may be viable to improve this. |
I've looked into this some more, the issue is that to use DataFrame reductions (instead of breaking up the frame into Series and using the Series reductions) needs to involve taking a transpose. E.g if you have a frame with a mixture of int/float dtypes, using I'd be in favor of not transposing the result, so that |
I would not have guessed this. Am I the only one who finds this weird?
I've been thinking lately that something a pattern like |
I think it goes back to allowing partial failure - that can only be done by first breaking up the frame into series and operating on each one. Once you've done this, it's a natural result of using concat.
+1 |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
Calling
df.agg([function])
is much slower thandf.agg(function)
when there are many columns and few rows.I apologize if this is a known issue, I could not find a reference based on keywords that come to mind. Inspired by this SO question.
Installed Versions
INSTALLED VERSIONS
commit : bb1f651
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-1028-gcp
Version : #32~20.04.1-Ubuntu SMP Wed Jan 12 20:08:27 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.0
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.dev0
setuptools : 56.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.4.3
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.23
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
Prior Performance
No response
The text was updated successfully, but these errors were encountered: