Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialization for accurate complex summation in sum()? #121149

Closed
skirpichev opened this issue Jun 29, 2024 · 9 comments
Closed

Specialization for accurate complex summation in sum()? #121149

skirpichev opened this issue Jun 29, 2024 · 9 comments
Labels
3.14 new features, bugs and security fixes type-feature A feature request or enhancement

Comments

@skirpichev
Copy link
Member

skirpichev commented Jun 29, 2024

Feature or enhancement

Proposal:

Currently, sum() builtin lacks any specialization for complex numbers, yet it's usually faster than better pure-python alternatives.

benchmark sum() wrt pure-python version
# a.py
from random import random, seed
seed(1)
data = [complex(random(), random()) for _ in range(10)]
def msum(xs):
    it = iter(xs)
    res = next(it)
    for z in it:
        res += z
    return res
def sum2(xs):
    return complex(sum(_.real for _ in xs),
                   sum(_.imag for _ in xs))
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'msum(data)'
200000 loops, best of 11: 1.31e+03 nsec per loop

Hardly using sum() component-wise is an option:

$ ./python -m timeit -r11 -unsec -s 'from a import data, sum2' 'sum2(data)'
50000 loops, best of 11: 8.56e+03 nsec per loop

Unfortunately, direct using this builtin in numeric code doesn't make sense, as results are (usually) inaccurate. It's not too hard to do summation component-wise with math.fsum(), but it's slow and there might be a better way.

In #100425 simple algorithm, using compensated summation, was implemented in sum() for floats. I propose (1) make specialization in sum() for complex numbers, and (2) reuse #100425 code to implement accurate summation of complexes.

(1) is simple and strightforward, yet it will give some measurable performance boost

benchmark sum() in the main wrt added specialization for complex
diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c
index 6e50623caf..da0eed584a 100644
--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2691,6 +2691,59 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
             }
         }
     }
+
+    if (PyComplex_CheckExact(result)) {
+        Py_complex c_result = PyComplex_AsCComplex(result);
+        Py_SETREF(result, NULL);
+        while(result == NULL) {
+            item = PyIter_Next(iter);
+            if (item == NULL) {
+                Py_DECREF(iter);
+                if (PyErr_Occurred())
+                    return NULL;
+                return PyComplex_FromCComplex(c_result);
+            }
+            if (PyComplex_CheckExact(item)) {
+                Py_complex x = PyComplex_AsCComplex(item);
+                c_result.real += x.real;
+                c_result.imag += x.imag;
+                _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
+                continue;
+            }
+            if (PyLong_Check(item)) {
+                long value;
+                int overflow;
+                value = PyLong_AsLongAndOverflow(item, &overflow);
+                if (!overflow) {
+                    c_result.real += (double)value;
+                    c_result.imag += 0.0;
+                    Py_DECREF(item);
+                    continue;
+                }
+            }
+            if (PyFloat_Check(item)) {
+                double value = PyFloat_AS_DOUBLE(item);
+                c_result.real += value;
+                c_result.imag += 0.0;
+                Py_DECREF(item);
+                continue;
+            }
+            result = PyComplex_FromCComplex(c_result);
+            if (result == NULL) {
+                Py_DECREF(item);
+                Py_DECREF(iter);
+                return NULL;
+            }
+            temp = PyNumber_Add(result, item);
+            Py_DECREF(result);
+            Py_DECREF(item);
+            result = temp;
+            if (result == NULL) {
+                Py_DECREF(iter);
+                return NULL;
+            }
+        }
+    }
 #endif
 
     for(;;) {

main:

$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop

with specialization:

$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 606 nsec per loop

(2) also seems to be a no-brain task: simple refactoring of PyFloat specialization should allows us use same core for PyComplex specialization.

If there are no objections against - I'll work on a patch.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@skirpichev skirpichev added the type-feature A feature request or enhancement label Jun 29, 2024
@Eclips4
Copy link
Member

Eclips4 commented Jun 29, 2024

cc @rhettinger @mdickinson

@franklinvp
Copy link

My opinion is that the goal of being accurate should be delegated to a specialized function and that the built in function sum should return to have its simple-to-explain meaning of "adding from left to right using the addition that the elements define".

The change introduced in #100425 increased the complexity of explaining what sum does. See some examples.

Sum of finite precision floating point has its surprises, when one haven't learned that it is not real numbers what is being added. But in my opinion, the built in sum is not the right place to mitigate that. With the aim of explicit is better than implicit, I think it is better that the use of a compensated sum should be in done in a function that is explicitly made for that purpose. Didactically, those who haven't learned floating point arithmetic, should be exposed to it in the immediate tools of the language + and built in sum.

@rhettinger
Copy link
Contributor

rhettinger commented Jun 29, 2024

If there are not objections against - I'll work on a patch.

+1 from me. As a test, the result should match complex(sum(z.real for z in zarr), sum(z.imag for z in zarr))

@skirpichev
Copy link
Member Author

its simple-to-explain meaning of "adding from left to right using the addition that the elements define".

One reason for builtin function to be a builtin - well, be useful. Just be "simple-to-explain" is not enough.

What is a point in preserving huge accumulated rounding error? Say, if you want to reproduce inaccurate sum with some specific order of operations - you can just use something like functools.reduce(operator.add, xs) or an explicit for loop.

Also, is it "simple-to-explain" for you when sum(xs) and sum(reversed(xs)) are different?

use of a compensated sum should be in done in a function that is explicitly made for that purpose.

There is also math.fsum(), it's mentioned in sphinx docs for sum(). But it uses much more complex algorithm.

(BTW, it looks faster on random data.)

+1 from me.

FYI, pr is ready for review: #121176

@gpshead gpshead added the 3.14 new features, bugs and security fixes label Jul 2, 2024
@franklinvp
Copy link

Clap, clap, clap.

Everyone implicitly being forced to opt in doing a compensated sum for float and complex, instead of explicitly doing so.
So long explicit is better than implicit.
Thanks for ruining one use case, when this algorithm could have perfectly be put either somewhere else or made it switchable, everyone benefitted without anyone being affected.

Please don't ever "enhance" it to subtypes. Those who need a fast and proper sum of float and complex in Python at least still have some hacks like

class R(float): pass
class C(complex): pass
sum(L, start=R(0.0)) # or sum(L, start=C(0.0))

Regarding

Also, is it "simple-to-explain" for you when sum(xs) and sum(reversed(xs)) are different?

That is the type of thing that you want to have to explain.
It is a good thing having people come over and over asking about it.
That phenomenon is a feature of finite precision floating point arithmetic. They get to learn it.
Python, a language used so often for education, now tries to hide this as early as its built in function sum, catering to ignorance.

@rhettinger
Copy link
Contributor

Clap, clap, clap.
... [sarcasm] ...
catering to ignorance.

Your abusive posts are not welcome here.

@skirpichev
Copy link
Member Author

Please don't ever "enhance" it to subtypes.

Probably, this doesn't make sense just because __add__() can be changed for subtypes:

>>> class myint(int):
...     def __add__(self, other):
...         print("Nobody expects the Spanish Inquisition!")
...         
>>> myint()+myint()
Nobody expects the Spanish Inquisition!

I can't imagine many practical cases to do that... Perhaps, implementing some parallel algorithm for addition of big ints?

But that's all. More generic checks (say, PyFloat_Check()) are costly in case of negative results. But in sum() we stop specialization of given type and fall though to more generic one (say, to complex) or to using PyNumber_Add(). So, I think that more generic checks are cheap in this type of workflow.

@rhettinger, is this extension of specialization in sum() does make sense for you?

If not, please note that after 88bdb92 we have PyLong_Check()'s in some places. I think that was a bug (CC @serhiy-storchaka). This was extended in my pr with one PyFloat_Check() and I forgot to fix that, sorry:(

a patch to fix few instances of "soft" type checks
diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c
index a5b45e358d..49675c224c 100644
--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2686,7 +2686,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
                 _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
                 continue;
             }
-            if (PyLong_Check(item)) {
+            if (PyLong_CheckExact(item) || PyBool_Check(item)) {
                 long value;
                 int overflow;
                 value = PyLong_AsLongAndOverflow(item, &overflow);
@@ -2735,7 +2735,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
                 Py_DECREF(item);
                 continue;
             }
-            if (PyLong_Check(item)) {
+            if (PyLong_CheckExact(item) || PyBool_Check(item)) {
                 long value;
                 int overflow;
                 value = PyLong_AsLongAndOverflow(item, &overflow);
@@ -2746,7 +2746,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
                     continue;
                 }
             }
-            if (PyFloat_Check(item)) {
+            if (PyFloat_CheckExact(item)) {
                 double value = PyFloat_AS_DOUBLE(item);
                 re_sum.hi += value;
                 im_sum.hi += 0.0;

Python, a language used so often for education, now tries to hide

Not at all!

>>> res, xs = 0.0, [0.1]*10
>>> for _ in xs:
...     res += _
...     
>>> res
0.9999999999999999

@serhiy-storchaka
Copy link
Member

If not, please note that after 88bdb92 we have PyLong_Check()'s in some places. I think that was a bug (CC @serhiy-storchaka).

Do you have an example where this makes difference?

@skirpichev
Copy link
Member Author

Hmm, you are right, sorry for a false alarm. All such examples looks broken.

skirpichev added a commit to skirpichev/mpmath that referenced this issue Aug 3, 2024
After python/cpython#121149 this was broken on CPython 3.14:

_______________ [doctest] mpmath.functions.orthogonal.spherharm ____________
EXAMPLE LOCATION UNKNOWN, not showing all tests of that example
??? >>> fp.chop(fp.quad(lambda t,p: Y1(t,p)*Y2(t,p)*dS(t,p), *sphere))
Expected:
    1.0000000000000007
Got:
    1.0000000000000004

This particular failure could be fixed by:
diff --git a/mpmath/ctx_base.py b/mpmath/ctx_base.py
index 3309bfa..2ca2fac 100644
--- a/mpmath/ctx_base.py
+++ b/mpmath/ctx_base.py
@@ -110,7 +110,8 @@ def fdot(ctx, xs, ys=None, conjugate=False):
             cf = ctx.conj
             return sum((x*cf(y) for (x,y) in xs), ctx.zero)
         else:
-            return sum((x*y for (x,y) in xs), ctx.zero)
+            return ctx.mpc(sum(((x*y).real for (x,y) in xs), ctx.zero),
+                           sum(((x*y).imag for (x,y) in xs), ctx.zero))

     def fprod(ctx, args):
         prod = ctx.one

But this is just one case for the whole test suite...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants