Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.mean along multiple axis gives wrong result for large arrays #8869

Open
TobiasLe opened this issue Mar 30, 2017 · 16 comments
Open

numpy.mean along multiple axis gives wrong result for large arrays #8869

TobiasLe opened this issue Mar 30, 2017 · 16 comments

Comments

@TobiasLe
Copy link

TobiasLe commented Mar 30, 2017

A mean over an array containing only ones should obviously return only ones. However,

>>> import numpy as np
>>> test_array = np.ones((10000000, 4, 15), dtype = np.float32)
>>> print(test_array.mean(axis=(0,1)))
[ 0.4194304  0.4194304  0.4194304  0.4194304  0.4194304  0.4194304
  0.4194304  0.4194304  0.4194304  0.4194304  0.4194304  0.4194304
  0.4194304  0.4194304  0.4194304]

This returns the correct result:

>>> print(test_array.mean(axis=0).mean(axis=0))
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

I guess the reason for this problem is some overflow, because it does not appear when I use a test array with dtype = np.float64. I would expect numpy to either give the correct result or to at least give a warning whenever such an overflow happens.

surprisingly mean along all axis gives the correct result again:

>>> print(test_array.mean(axis=(0,1,2)))
1.0

(I used numpy-1.12.1 with python 3.5 on Ubuntu 16)

@eric-wieser
Copy link
Member

I can confirm the results are the same on master

test_array.mean(axis=(0,1), dtype=np.float64) works as expected.

@eric-wieser
Copy link
Member

eric-wieser commented Mar 30, 2017

A more alarming case:

In [18]: test_array[:,:,:1].sum(axis=(0,1))
Out[18]: array([ 40000000.], dtype=float32)

In [19]: test_array[:,:,:2].sum(axis=(0,1))
Out[19]: array([ 16777216.,  16777216.], dtype=float32)

@juliantaylor
Copy link
Contributor

juliantaylor commented Mar 30, 2017

this again boils down to the more robust summation algorithm is only applied along the trailing axes.
mean(axis=(1,2)) should give better results.

@TobiasLe
Copy link
Author

TobiasLe commented Mar 30, 2017

That is true it works for axis (1,2) but not for (0,1):

>>> test_array = np.ones((4,10000000,4), dtype=np.float32)
>>> test_array.sum(axis=(0,1))
array([ 16777216.,  16777216.,  16777216.,  16777216.], dtype=float32)
>>> test_array.sum(axis=(1,2))
array([ 40000000.,  40000000.,  40000000.,  40000000.], dtype=float32)

@vfdev-5
Copy link

vfdev-5 commented Mar 1, 2018

I wonder whether the following is related with this bug ?

>>> aa = np.ones((5001 * 5001, 3), dtype=np.float32)
>>> bb = np.ones((5001 * 5001 * 3), dtype=np.float32)
>>> np.sum(aa, axis=0), np.sum(bb, axis=0), 5001 * 5001 * 3
(array([16777216., 16777216., 16777216.], dtype=float32), 75030000.0, 75030003)

in the 2nd result 3 is missing vs 3rd results.

for i in range(-10, 10):
    s = (4 * 1024) ** 2 + i
    bb = np.ones((s), dtype=np.float32)
    res = int(np.sum(bb, axis=0))
    print(res, s, res == s)

gives

16777206 16777206 True
16777207 16777207 True
16777208 16777208 True
16777209 16777209 True
16777210 16777210 True
16777211 16777211 True
16777212 16777212 True
16777213 16777213 True
16777214 16777214 True
16777215 16777215 True
16777216 16777216 True
16777216 16777217 False
16777218 16777218 True
16777220 16777219 False
16777220 16777220 True
16777220 16777221 False
16777222 16777222 True
16777224 16777223 False
16777224 16777224 True
16777224 16777225 False

emdupre added a commit to ME-ICA/tedana that referenced this issue May 10, 2018
To aid in merging #22, where we've collapsed the xyz dimensions into an S dimension. Tracing back on a numpy core bug with large arrays: numpy/numpy#8869 (comment)
@YubinXie
Copy link

This issue still exits in numpy '1.14.5'.

test_array = np.ones((10000000, 4, 15), dtype = np.float32)
print(test_array.mean(axis=(0,1)))
[0.4194304 0.4194304 0.4194304 0.4194304 0.4194304 0.4194304 0.4194304
 0.4194304 0.4194304 0.4194304 0.4194304 0.4194304 0.4194304 0.4194304
 0.4194304]
test_array = np.ones((10000000, 4, 15))
print(test_array.mean(axis=(0,1)))
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

@vikramambrose
Copy link

Similar issue. Google brought me here. How do we resolve this?

>>> import numpy as np
>>> foo = np.load("data.npy")
>>> foo.shape
(1, 128, 128, 128, 4)
>>> np.mean(foo,axis=(0,1,2,3))
array([-0.54445046, -0.52889204, -0.4197723 , -0.37575784], dtype=float32)
>>> np.mean(foo[:,:,:,:,0])
-0.5486783
>>> np.mean(foo[:,:,:,:,1])
-0.52285635
>>> np.mean(foo[:,:,:,:,2])
-0.4240536
>>> np.mean(foo[:,:,:,:,3])
-0.37483233
>>> np.version.version
'1.14.2'
>>>

@YubinXie
Copy link

YubinXie commented Jul 29, 2019

Now I think this could be an overflow issue. The np.mean very likely sum things up first, and if it is out of range of np.float32, then the results is not accurate anymore.

Using float64 might solve this problem in most cases.

@mproszewska
Copy link
Contributor

Would it make sense to change implementation of np.mean, so that it would be calculated for appropriate dimensions iteratively e.g.

>>> import numpy as np
>>> test_array = np.ones((10000000, 4, 15), dtype = np.float32)
>>> test_array.mean(axis=(0,1))

would return result of 'test_array.mean(axis=0).mean(axis=0)'?
Can array_like from np.mean be equal [[1], [2, 3]]?

@mproszewska
Copy link
Contributor

I implemented approach described above (pull request). Happy to get some feedback.

@miccoli
Copy link
Contributor

miccoli commented Jan 15, 2020

@eric-wieser already pointed to the correct cause: the summation algorithm is "smart" only along contiguous memory regions. (No overflow here @YubinXie, but catastrophic precision loss)

smart:

>>> np.ones((10**8,2), dtype=np.float32, order="F").sum(axis=(0,))
array([1.e+08, 1.e+08], dtype=float32)
>>> np.ones((2,10**8), dtype=np.float32, order="C").sum(axis=(1,))
array([1.e+08, 1.e+08], dtype=float32)

naive:

>>> np.ones((10**8,2), dtype=np.float32, order="C").sum(axis=(0,))
array([16777216., 16777216.], dtype=float32)
>>> np.ones((2,10**8), dtype=np.float32, order="F").sum(axis=(1,))
array([16777216., 16777216.], dtype=float32)

Please note that

>>> 2 / np.finfo(np.float32).eps
16777216.0

confirming that indeed np.ones((10**8,2), dtype=np.float32, order="C").sum(axis=(0,)) is implemented as a naive summation: sum=0 and sum += arr[i] for i in range(n)

The approach proposed by @mproszewska does not solve this problem, but only alleviates it in the special case in which we have the sum along multiple axes, but the sum along a single axis does not suffer from precision loss. In other words it will not resolve

>>> np.ones((10**8,2), dtype=np.float32, order="C").mean(axis=(0,))
array([0.16777216, 0.16777216], dtype=float32)

@miccoli
Copy link
Contributor

miccoli commented Jan 15, 2020

@vfdev-5 I would not call your example a bug: even with a 'smart' summation algorithm, some precision loss is to be expected. Indeed the integer 16777217 has no exact floating point representation in single precision:

>>> np.float32(16777217)
16777216.0
>>> np.float32(16777217) == np.float32(16777216)
True

@radomirgr
Copy link

Will this issue be solve at some point? It was opened in 2017 and it is still relevant. I have just opened similar issue here: #25909. I saw that when you copy data then you have problem with sum:

python
n = 100_000_000
a = np.array((np.full(n, 1, dtype=np.float32), np.full(n, 0.01, dtype=np.float32), np.full(n, -1, dtype=np.float32))).T
print(a.shape) # (100000000, 3)
print(a.sum(axis=0)) # [ 1.0000000e+08  1.0000628e+06 -1.0000000e+08]
print(a.copy().sum(axis=0)) # [ 16777216.    262144. -16777216.]
print(a.copy(order="C").sum(axis=0)) # [ 16777216.    262144. -16777216.]
print(a.copy(order="F").sum(axis=0)) # [ 1.0000000e+08  1.0000628e+06 -1.0000000e+08]

However you don't have such problem with single collumn:

python
n = 100_000_000
a = np.array((np.full(n, 1, dtype=np.float32))).T
print(a.shape) # (100000000,)
print(a.sum(axis=0)) # 100000000.0
print(a.copy().sum(axis=0)) # 100000000.0
print(a.copy(order="C").sum(axis=0)) # 100000000.0
print(a.copy(order="F").sum(axis=0)) # 100000000.0

@radomirgr
Copy link

The docs are saying that:

This improved precision is always provided when no axis is given

However in this case this is not true:

n = 1000_000
a = np.array((np.full(n, 1, dtype=np.float32), np.full(n, 0.0000001, dtype=np.float32), np.full(n, -1, dtype=np.float32)))
print(a.copy(order="F").sum()) # 0.11920929
print(a.copy(order="C").sum()) # 0

print(a.copy(order="F").T.sum()) # 0.11920929
print(a.copy(order="C").T.sum()) # 0.0

print(a.T.copy(order="F").sum()) # 0.0
print(a.T.copy(order="C").sum()) # 0.11920929

@radomirgr
Copy link

Moreover when you use torch.from_numpy it have the similiar output in both cases. However it is much faster:
image

@radomirgr
Copy link

radomirgr commented Mar 1, 2024

Could numpy use the same logic which is in torch cpu, which looks like is both faster and have correct results?
Or at least have a flag to use pair wise sum for better calculations

Torch is faster with single thread:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants