[deltaE] Speedup of 7th powers in DeltaE2000 #340

dom1n1k · 2023-10-21T17:45:16Z

The current version uses the Math.pow function and the ** operator, which translates into same function.
We could hope that JIT optimizes integer powers. But, as practical experiments show, it doesn't work at the moment.
Math.pow with a float argument is quite expensive. Мuch more expensive than sqrt or sin e.g.

I wrote a small helper function pow7 and applied it in only 2 places.
On my computer it gives a speedup from ~10% (FF 118) to ~20% (Chromium 118, Node 21) — meaning total time of deltaE2000 function, not isolated power.

LeaVerou · 2023-10-21T19:52:21Z

That’s very interesting. What's the cause of the slowdown? E.g. would x * x * x * x * x * x * x be even faster?

facelessuser · 2023-10-21T20:11:34Z

The slowdown is due to the general purpose nature of pow. Most functions of this sort, because they are general purpose, have additional checks and logic which creates overhead and slowness (relatively speaking, for some they are more than fast enough). You will often see people at times who are trying to optimize as much as possible do things like apply the multiplication themselves as the multiplication operation is generally faster as there is no overhead for additional function calls and other checks. So Math.pow(2, 2) would be slower than 2 * 2.

Additionally, if you specialize the operation to reduce how many times you call multiply, you could potentially speed up performance a bit more.

Some people don't bother as they don't require the absolute fastest, most optimized solution, but if it is something you care about, I imagine it would be useful.

Usually, if I do care about performance, I often request people provide, data backing their performance claims. Like how they tested, and the results.

Generally, I am inclined to believe this claim is faster just because I am familiar with this type of optimization, the question is whether you value the performance or the simplicity of just using Math.pow().

dom1n1k · 2023-10-21T20:47:26Z

What's the cause of the slowdown?

Math.pow - universal function for arbitrary power, includes fractional.
This is calculated via Taylor series:
https://en.wikipedia.org/wiki/Taylor_series#Exponential_function
Most likely, even two series, because an arbitrary power is calculated through exponent and natural logarithm.

x * x * x * x * x * x * x - doubtful, because these are 6 muls. Can be reduced to 4.

You will often see people at times who are trying to optimize as much as possible

I'm not suggesting saving nanoseconds.
But DE2000 is really heavy function. And pow takes a significant time.

Myndex · 2023-10-21T20:51:51Z

Funny, a couple weeks ago I happened to run some benchmarks out of curiosity, regarding the simplified Phi calc, and ran across this difference, and was rather surprised at the difference between Math.pow() and the ** operator.

Safari and Chrome are similar, Safari shown:

Clearly, the x ** y is twice as fast as Math.pow(x,y), and x ** 0.5 also appears twice as fast as Math.sqrt(x)

However, the thing I found most surprising, and I mean shocking, is that not only is the difference between methods not apparent in Firefox, Firefox is overall two orders of magnitutde faster in terms of ops per second:

However, in terms of actual performance on web apps, I don't find Firefox to "feel" that much faster. It makes me wonder a bit about the way the operations per second is being reported out of Firefox, which I am guessing is what the benchmark site is using.

In other words, is the Firefox ops per second reporting each individual CPU instruction per second (i.e. a fetch is one, a shift is one, etc etc), and Safari/Chrome are reporting completed JS operations per second (i.e. the start to finish ** operation)...?

I am curious, as that's how it appears...

Myndex · 2023-10-21T20:59:07Z

Hi @dom1n1k

...Math.pow - universal function for arbitrary power, includes fractional....

But the ** is also arbitrary, and includes fractional values. x ** 0.5 is the square root of x for instance.

Myndex · 2023-10-21T21:08:31Z

Hi Issac @facelessuser

...performance or the simplicity of just using 'Math.pow()'...

I would argue that x ** y is simpler than 'Math.pow(x,y)' though ** may be less readable when visually scanning over code. But I prefer the adage "terse code, verbose comments" so I prefer **, particularly in light of the performance shown.

LeaVerou · 2023-10-21T21:18:25Z

What's the cause of the slowdown?

Math.pow - universal function for arbitrary power, includes fractional. This is calculated via Taylor series: en.wikipedia.org/wiki/Taylor_series#Exponential_function Most likely, even two series, because an arbitrary power is calculated through exponent and natural logarithm.

x * x * x * x * x * x * x - doubtful, because these are 6 muls. Can be reduced to 4.

I see, so it's about keeping multiplications down. Yeah, I couldn't get it below 4 with any other combination 😁

You will often see people at times who are trying to optimize as much as possible

I'm not suggesting saving nanoseconds. But DE2000 is really heavy function. And pow takes a significant time.

Yeah, and DE2000 is used iteratively a lot, e.g. in gamut mapping, so I'm totally on board with optimizing it.

Presumably this is a temporary fix until browsers optimize this properly though. I wonder if it would be feasible to do this as a build process plugin?

Myndex · 2023-10-21T21:33:28Z

** is fastest

I just did a benchmark of the function pow7() as written in the PR, and other methods. ** is the fastest, followed by pow7(), and then Math.pow() and surprisingly, x*x*x*x*x*x*x was the slowest.

//The pow7() function

function pow7 (x) {
 	const x2 = x * x;
 	const x7 = x2 * x2 * x2 * x;
 	return x7;
 }

The benchmark: https://www.measurethat.net/Benchmarks/Show/28008/0/power-to-the-7

The results:

facelessuser · 2023-10-21T21:47:29Z

I would argue that x ** y is simpler than 'Math.pow(x,y)' though ** may be less readable when visually scanning over code. But I prefer the adage "terse code, verbose comments" so I prefer **, particularly in light of the performance shown.

I prefer ** generally as well, and I'm speaking generally as to why people have historically applied such an approach. I'm not sure ** is always faster than a targetted, optimized approach, I imagine there are cases it does great and cases it does less great. Generally, I don't often worry about optimizing these cases and will just use **, but in cases where I do care, I'm benchmarking to make sure I'm actually improving things.

facelessuser · 2023-10-21T21:49:03Z

I see, so it's about keeping multiplications down. Yeah, I couldn't get it below 4 with any other combination 😁

It's probably not just about keeping multiplication combinations down. I imagine things like pow and ** optimize as well, but they are either a generic algorithm and/or make optimization decisions on the fly, which adds overhead. But as always, benchmarking will tell you if you are actually getting a gain or not.

Myndex · 2023-10-21T21:56:06Z

Further Benchmarks

Just to evaluate completely, I optimized some of the methods, such as eliminating unneeded assignments. While it helped improve the function speed slightly, ** remains fastest.

function pow7 (x) {
 	const x2 = x * x;
 	const x7 = x2 * x2 * x2 * x;
 	return x7;
 }

// Slightly more optimized versions:

function pow27 (x) {
 	const x2 = x * x;
 	return x2 * x2 * x2 * x
 }

function pow37 (x) {
  const x3 = x * x * x;
  return x3 * x3 * x
}

function powMult7 (x) {
  return x*x*x*x*x*x*x
}

** is still fastest. Nothing also that abstracting the x*x*x*x*x*x*x into a function nearly trippled it's performance. Containing multiple operations inside a function allows the compiler to optimize more effectively.

Edit to add the URL for this bench is https://www.measurethat.net/Benchmarks/Show/28008/1/to-the-7th-power-v2

Myndex · 2023-10-21T22:12:36Z

...On my computer it gives a speedup from ~10% (FF 118) to ~20% (Chromium 118, Node 21) — meaning total time of deltaE2000 function, not isolated power...

So, this is weird, because on my machine, the pow7() is slower than the ** that is in the code. I'm on MacOS, on Intel, MacbookPro. I imagine potential bench differences on a desktop, on M1/iOS, etc.

But I am also wondering re the methodology used that returned the entire ∆E2000 function 20% faster, when this is a small part of the total function, and I'm not seeing anything that would indicate such a savings?

dom1n1k · 2023-10-22T10:34:01Z

I don't recommend trusting microbenchmarks. They often lie because the compiler removes "dead" code.

My code is below.
I tested it on Node v21 right now: 17-18% savings.

import Color from "./src/index.js";
import { deltaE2000 } from "./src/deltaE/index.js";

const k = 11;
const n = k ** 3;
const colors = [];
let accum = 0;

// generate array of 11 ** 3 = 1331 colors
for (let r = 0; r < k; r++) {
    for (let g = 0; g < k; g++) {
        for (let b = 0; b < k; b++) {
            const rgbColor = new Color("prophoto", [
                r / (k - 1),
                g / (k - 1),
                b / (k - 1)
            ]);
            const labColor = rgbColor.to("lab");
            colors.push(labColor);
        }
    }
}

// start
const t0 = performance.now();

// each to each color, 1331 ** 2 = 1771561 pairs
for (let i = 0; i < n; i++) {
    const color1 = colors[i];

    for (let j = 0; j < n; j++) {
        const color2 = colors[j];

        // we must use the result
        // otherwise the compiler will remove the "dead" code
        accum += deltaE2000(color1, color2);
    }
}

// finish
const t1 = performance.now();

// we must use the result
// then the compiler knows it's working code
console.log(`result: ${accum}`);
console.log(`time:   ${t1 - t0} ms`);

Myndex · 2023-10-22T12:42:36Z

Hi @dom1n1k

...I don't recommend trusting microbenchmarks. They often lie because the compiler removes "dead" code...

I had assignments so the code would be exercised... But it does make me wonder if somehow that is what's going on with FireFox—I don't use FF so haven't really looked into those differences.

I'm going to try running in Node instead, node handles things differently than browsers & could be the reason for the discrepancy. Also, I did not test with all the machines I have, so there are a number of influences to look into.

dom1n1k · 2023-10-22T15:27:46Z

Hi @Myndex

Meanwhile, if you want to see isolated benchmark, we can. But we need to consider a few conditions:

we need to use results (otherwise compiler will remove it);
apply functions to different numbers (otherwise compiler will remove dups);
tests must be enclosed into functions (JIT optimizes functions);
do not run tests from the browser console (can be much slower than usual);
nice to have several runs to get median time.

I got difference about 15-25 times in FF and 40-50 times in Chrome/Node

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>pow7() benchmark</title>
    <script type="text/javascript" defer src="pow7-bench.js"></script>
</head>
<body>
    <h1>Math.pow(), ** and pow7() benchmark</h1>
    <p>Please see console</p>
</body>
</html>

const n  = 1e7;  // number of operations
const x1 =  0;
const x2 = 10;
const dx = (x2 - x1) / n;

function pow7 (x) {
    const x2 = x * x;
    const x7 = x2 * x2 * x2 * x;
    return x7;
}

function testStarStar () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += x ** 7;
        x += dx;
    }
    return accum;
}

function testMathPow () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += Math.pow(x, 7);
        x += dx;
    }
    return accum;
}

function testCustomPow7 () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += pow7(x);
        x += dx;
    }
    return accum;
}

function testMuls () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += x * x * x * x * x * x * x;
        x += dx;
    }
    return accum;
}

const funcs = [
    testStarStar,
    testMathPow,
    testCustomPow7,
    testMuls
];

const F = funcs.length;  // number of functions
const R = 5;             // number of runs
const accums = Array(F);
const times  = Array(F);

// benchmarks
for (let f = 0; f < F; f++) {
    const func = funcs[f];

    accums[f] = 0;
    times[f] = [];
    
    for (let r = 0; r < R; r++) {
        const t0 = performance.now();
        const result = func();
        const t1 = performance.now();

        accums[f] += result;
        times[f].push(t1 - t0);
    }
}

// print median times
for (let f = 0; f < F; f++) {
    times[f].sort((a, b) => a - b)
    const medianTime = times[f][Math.floor(R / 2)];

    console.log(f);
    console.log(`result: ${accums[f]}`);  // we must use results
    console.log(`times:  ${ Math.round(100 * medianTime) / 100 } ms`);
}

Myndex · 2023-10-22T21:16:21Z

WoWwwww....

@dom1n1k Thank you for your patience and the extra info... I am surprised at the number of variations here...

All done on MacOS, intel macbookpro

Here is Safari

And here indeed your function is fastest.

But the part that is jaw dropping is Chrome

Srsly wut? Here I can see how you had better overall results with the total DE....

Firefox is the slowest

That's not good...

And then Opera, predictably like Chrome:

Surprising variation... and whats the deal with the console? just not optimized? because running that script in the console returns very different speeds. This is also true for BBEdit even when running as the webpage...

Thank you again for your extra time, appreciated.

dom1n1k · 2023-10-22T21:48:25Z

and whats the deal with the console? just not optimized?

Highly likely yes. Code in console often goes to different pipeline.

Myndex · 2023-10-23T00:01:08Z

interesting that Math.sqrt(x) appears faster than x ** 0.5 on Safari and Firefox, but other way around on Chrome. Results are close enough tho...

dom1n1k · 2023-10-23T00:09:54Z

Normally sqrt(x) much faster than pow(x, 0.5).
This is a special case for which there are more efficient algorithms.

Myndex · 2023-10-23T01:39:40Z

Normally sqrt(x) much faster than pow(x, 0.5).
This is a special case for which there are more efficient algorithms.

Interesting, I seem to remember reading something some years ago indicating otherwise, or some ambiguity.

[deltaE] Speedup of power 7

4a2297f

svgeesus approved these changes Nov 14, 2023

View reviewed changes

svgeesus merged commit 2154c8d into color-js:main Nov 14, 2023
4 checks passed

apiiro-snyk mentioned this pull request Jul 3, 2024

[Snyk] Upgrade colorjs.io from 0.4.5 to 0.5.0 apiiro-snyk/halo#3

Open

Carolinewly mentioned this pull request Aug 4, 2024

[Snyk] Upgrade colorjs.io from 0.4.5 to 0.5.2 Carolinewly/halo#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deltaE] Speedup of 7th powers in DeltaE2000 #340

[deltaE] Speedup of 7th powers in DeltaE2000 #340

dom1n1k commented Oct 21, 2023 •

edited

Loading

LeaVerou commented Oct 21, 2023

facelessuser commented Oct 21, 2023 •

edited

Loading

dom1n1k commented Oct 21, 2023

Myndex commented Oct 21, 2023

Myndex commented Oct 21, 2023

Myndex commented Oct 21, 2023

LeaVerou commented Oct 21, 2023

Myndex commented Oct 21, 2023

facelessuser commented Oct 21, 2023

facelessuser commented Oct 21, 2023

Myndex commented Oct 21, 2023 •

edited

Loading

Myndex commented Oct 21, 2023

dom1n1k commented Oct 22, 2023

Myndex commented Oct 22, 2023

dom1n1k commented Oct 22, 2023

Myndex commented Oct 22, 2023

dom1n1k commented Oct 22, 2023

Myndex commented Oct 23, 2023 •

edited

Loading

dom1n1k commented Oct 23, 2023

Myndex commented Oct 23, 2023

[deltaE] Speedup of 7th powers in DeltaE2000 #340

[deltaE] Speedup of 7th powers in DeltaE2000 #340

Conversation

dom1n1k commented Oct 21, 2023 • edited Loading

LeaVerou commented Oct 21, 2023

facelessuser commented Oct 21, 2023 • edited Loading

dom1n1k commented Oct 21, 2023

Myndex commented Oct 21, 2023

Myndex commented Oct 21, 2023

Myndex commented Oct 21, 2023

LeaVerou commented Oct 21, 2023

Myndex commented Oct 21, 2023

** is fastest

facelessuser commented Oct 21, 2023

facelessuser commented Oct 21, 2023

Myndex commented Oct 21, 2023 • edited Loading

Further Benchmarks

Myndex commented Oct 21, 2023

dom1n1k commented Oct 22, 2023

Myndex commented Oct 22, 2023

dom1n1k commented Oct 22, 2023

Myndex commented Oct 22, 2023

WoWwwww....

dom1n1k commented Oct 22, 2023

Myndex commented Oct 23, 2023 • edited Loading

dom1n1k commented Oct 23, 2023

Myndex commented Oct 23, 2023

dom1n1k commented Oct 21, 2023 •

edited

Loading

facelessuser commented Oct 21, 2023 •

edited

Loading

Myndex commented Oct 21, 2023 •

edited

Loading

Myndex commented Oct 23, 2023 •

edited

Loading