-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sum of a column of an empty dataframe #19813
Comments
There's two issues here.
Given that the behavior now matches |
This is really strange behavior for NumPy. I suspect it's a bug. See numpy/numpy#10639. For pandas, this is slightly complicated by how we use object arrays for different types, including booleans with NA and strings as well as arbitrary Python types. On a string array (with object dtype),
So it's not entirely clear that the right answer is 0 here. I suspect it is, and we should encourage using |
Thanks. Returning a bool did seem a little strange.
In that case, I'm not sure what the thing to do here is...Maybe a new
subsection in
http://pandas-docs.github.io/pandas-docs-tr avis/basics.html#dtypes
specific to empty
containers would be helpful, but we have some work to do making those
consistent first.
…On Wed, Feb 21, 2018 at 10:06 AM, Stephan Hoyer ***@***.***> wrote:
This is really strange behavior for NumPy. I suspect it's a bug. See
numpy/numpy#10639 <numpy/numpy#10639>.
For pandas, this is slightly complicated by how we use object arrays for
different types, including booleans with NA and strings as well as
arbitrary Python types. On a string array (with object dtype), sum()
concatenates:
In [32]: pd.Series(['foo', 'bar']).sum()
Out[32]: 'foobar'
So it's not entirely clear that the right answer is 0 here. I suspect it
is, and we should encourage using .str.cat() for string concatenation in
favor of .sum().
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19813 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIo-NLnXLnDgQJ0eoPK3w-7ogMr7Aks5tXD7QgaJpZM4SNrLz>
.
|
Closing, since this may be fixed upstream in NumPy 1.15: numpy/numpy#10639 |
Code Sample, a copy-pastable example if possible
Problem description
In pandas 0.22, the sum of a column of an empty dataframe is False. In earlier versions, 0.18.1 at least, the result would have been 0
While consistent with the default dtype of a DataFrame being obj, this isn't consistent with the 0.22 statement that the sum of an empty series is 0.0
Expected Output
(0.0, 0.0)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.11.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.3.0
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: