Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deltaE] Speedup of 7th powers in DeltaE2000 #340

Merged
merged 1 commit into from
Nov 14, 2023
Merged

Conversation

dom1n1k
Copy link
Contributor

@dom1n1k dom1n1k commented Oct 21, 2023

The current version uses the Math.pow function and the ** operator, which translates into same function.
We could hope that JIT optimizes integer powers. But, as practical experiments show, it doesn't work at the moment.
Math.pow with a float argument is quite expensive. Мuch more expensive than sqrt or sin e.g.

I wrote a small helper function pow7 and applied it in only 2 places.
On my computer it gives a speedup from ~10% (FF 118) to ~20% (Chromium 118, Node 21) — meaning total time of deltaE2000 function, not isolated power.

@LeaVerou
Copy link
Member

That’s very interesting. What's the cause of the slowdown? E.g. would x * x * x * x * x * x * x be even faster?

@facelessuser
Copy link
Collaborator

facelessuser commented Oct 21, 2023

The slowdown is due to the general purpose nature of pow. Most functions of this sort, because they are general purpose, have additional checks and logic which creates overhead and slowness (relatively speaking, for some they are more than fast enough). You will often see people at times who are trying to optimize as much as possible do things like apply the multiplication themselves as the multiplication operation is generally faster as there is no overhead for additional function calls and other checks. So Math.pow(2, 2) would be slower than 2 * 2.

Additionally, if you specialize the operation to reduce how many times you call multiply, you could potentially speed up performance a bit more.

Some people don't bother as they don't require the absolute fastest, most optimized solution, but if it is something you care about, I imagine it would be useful.

Usually, if I do care about performance, I often request people provide, data backing their performance claims. Like how they tested, and the results.

Generally, I am inclined to believe this claim is faster just because I am familiar with this type of optimization, the question is whether you value the performance or the simplicity of just using Math.pow().

@dom1n1k
Copy link
Contributor Author

dom1n1k commented Oct 21, 2023

What's the cause of the slowdown?

Math.pow - universal function for arbitrary power, includes fractional.
This is calculated via Taylor series:
https://en.wikipedia.org/wiki/Taylor_series#Exponential_function
Most likely, even two series, because an arbitrary power is calculated through exponent and natural logarithm.

x * x * x * x * x * x * x - doubtful, because these are 6 muls. Can be reduced to 4.

You will often see people at times who are trying to optimize as much as possible

I'm not suggesting saving nanoseconds.
But DE2000 is really heavy function. And pow takes a significant time.

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

Funny, a couple weeks ago I happened to run some benchmarks out of curiosity, regarding the simplified Phi calc, and ran across this difference, and was rather surprised at the difference between Math.pow() and the ** operator.

Safari and Chrome are similar, Safari shown:

Safari benchmarks of calculating Phi

Clearly, the x ** y is twice as fast as Math.pow(x,y), and x ** 0.5 also appears twice as fast as Math.sqrt(x)

However, the thing I found most surprising, and I mean shocking, is that not only is the difference between methods not apparent in Firefox, Firefox is overall two orders of magnitutde faster in terms of ops per second:

Firefox bench results are weird

However, in terms of actual performance on web apps, I don't find Firefox to "feel" that much faster. It makes me wonder a bit about the way the operations per second is being reported out of Firefox, which I am guessing is what the benchmark site is using.

In other words, is the Firefox ops per second reporting each individual CPU instruction per second (i.e. a fetch is one, a shift is one, etc etc), and Safari/Chrome are reporting completed JS operations per second (i.e. the start to finish ** operation)...?

I am curious, as that's how it appears...

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

Hi @dom1n1k

...Math.pow - universal function for arbitrary power, includes fractional....

But the ** is also arbitrary, and includes fractional values. x ** 0.5 is the square root of x for instance.

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

Hi Issac @facelessuser

...performance or the simplicity of just using 'Math.pow()'...

I would argue that x ** y is simpler than 'Math.pow(x,y)' though ** may be less readable when visually scanning over code. But I prefer the adage "terse code, verbose comments" so I prefer **, particularly in light of the performance shown.

@LeaVerou
Copy link
Member

What's the cause of the slowdown?

Math.pow - universal function for arbitrary power, includes fractional. This is calculated via Taylor series: en.wikipedia.org/wiki/Taylor_series#Exponential_function Most likely, even two series, because an arbitrary power is calculated through exponent and natural logarithm.

x * x * x * x * x * x * x - doubtful, because these are 6 muls. Can be reduced to 4.

I see, so it's about keeping multiplications down. Yeah, I couldn't get it below 4 with any other combination 😁

You will often see people at times who are trying to optimize as much as possible

I'm not suggesting saving nanoseconds. But DE2000 is really heavy function. And pow takes a significant time.

Yeah, and DE2000 is used iteratively a lot, e.g. in gamut mapping, so I'm totally on board with optimizing it.

Presumably this is a temporary fix until browsers optimize this properly though. I wonder if it would be feasible to do this as a build process plugin?

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

** is fastest

I just did a benchmark of the function pow7() as written in the PR, and other methods. ** is the fastest, followed by pow7(), and then Math.pow() and surprisingly, x*x*x*x*x*x*x was the slowest.

//The pow7() function

function pow7 (x) {
 	const x2 = x * x;
 	const x7 = x2 * x2 * x2 * x;
 	return x7;
 } 

The benchmark: https://www.measurethat.net/Benchmarks/Show/28008/0/power-to-the-7

The results:

raise to the power of 7

bargraph of results

@facelessuser
Copy link
Collaborator

I would argue that x ** y is simpler than 'Math.pow(x,y)' though ** may be less readable when visually scanning over code. But I prefer the adage "terse code, verbose comments" so I prefer **, particularly in light of the performance shown.

I prefer ** generally as well, and I'm speaking generally as to why people have historically applied such an approach. I'm not sure ** is always faster than a targetted, optimized approach, I imagine there are cases it does great and cases it does less great. Generally, I don't often worry about optimizing these cases and will just use **, but in cases where I do care, I'm benchmarking to make sure I'm actually improving things.

@facelessuser
Copy link
Collaborator

I see, so it's about keeping multiplications down. Yeah, I couldn't get it below 4 with any other combination 😁

It's probably not just about keeping multiplication combinations down. I imagine things like pow and ** optimize as well, but they are either a generic algorithm and/or make optimization decisions on the fly, which adds overhead. But as always, benchmarking will tell you if you are actually getting a gain or not.

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

Further Benchmarks

Just to evaluate completely, I optimized some of the methods, such as eliminating unneeded assignments. While it helped improve the function speed slightly, ** remains fastest.

function pow7 (x) {
 	const x2 = x * x;
 	const x7 = x2 * x2 * x2 * x;
 	return x7;
 }

// Slightly more optimized versions:

function pow27 (x) {
 	const x2 = x * x;
 	return x2 * x2 * x2 * x
 }

function pow37 (x) {
  const x3 = x * x * x;
  return x3 * x3 * x
}

function powMult7 (x) {
  return x*x*x*x*x*x*x
}

** is still fastest. Nothing also that abstracting the x*x*x*x*x*x*x into a function nearly trippled it's performance. Containing multiple operations inside a function allows the compiler to optimize more effectively.

bench results
results barchart

Edit to add the URL for this bench is https://www.measurethat.net/Benchmarks/Show/28008/1/to-the-7th-power-v2

@Myndex
Copy link
Contributor

Myndex commented Oct 21, 2023

...On my computer it gives a speedup from ~10% (FF 118) to ~20% (Chromium 118, Node 21) — meaning total time of deltaE2000 function, not isolated power...

So, this is weird, because on my machine, the pow7() is slower than the ** that is in the code. I'm on MacOS, on Intel, MacbookPro. I imagine potential bench differences on a desktop, on M1/iOS, etc.

But I am also wondering re the methodology used that returned the entire ∆E2000 function 20% faster, when this is a small part of the total function, and I'm not seeing anything that would indicate such a savings?

@dom1n1k
Copy link
Contributor Author

dom1n1k commented Oct 22, 2023

I don't recommend trusting microbenchmarks. They often lie because the compiler removes "dead" code.

My code is below.
I tested it on Node v21 right now: 17-18% savings.

import Color from "./src/index.js";
import { deltaE2000 } from "./src/deltaE/index.js";

const k = 11;
const n = k ** 3;
const colors = [];
let accum = 0;

// generate array of 11 ** 3 = 1331 colors
for (let r = 0; r < k; r++) {
    for (let g = 0; g < k; g++) {
        for (let b = 0; b < k; b++) {
            const rgbColor = new Color("prophoto", [
                r / (k - 1),
                g / (k - 1),
                b / (k - 1)
            ]);
            const labColor = rgbColor.to("lab");
            colors.push(labColor);
        }
    }
}

// start
const t0 = performance.now();

// each to each color, 1331 ** 2 = 1771561 pairs
for (let i = 0; i < n; i++) {
    const color1 = colors[i];

    for (let j = 0; j < n; j++) {
        const color2 = colors[j];

        // we must use the result
        // otherwise the compiler will remove the "dead" code
        accum += deltaE2000(color1, color2);
    }
}

// finish
const t1 = performance.now();

// we must use the result
// then the compiler knows it's working code
console.log(`result: ${accum}`);
console.log(`time:   ${t1 - t0} ms`);

@Myndex
Copy link
Contributor

Myndex commented Oct 22, 2023

Hi @dom1n1k

...I don't recommend trusting microbenchmarks. They often lie because the compiler removes "dead" code...

I had assignments so the code would be exercised... But it does make me wonder if somehow that is what's going on with FireFox—I don't use FF so haven't really looked into those differences.

I'm going to try running in Node instead, node handles things differently than browsers & could be the reason for the discrepancy. Also, I did not test with all the machines I have, so there are a number of influences to look into.

@dom1n1k
Copy link
Contributor Author

dom1n1k commented Oct 22, 2023

Hi @Myndex

Meanwhile, if you want to see isolated benchmark, we can. But we need to consider a few conditions:

  • we need to use results (otherwise compiler will remove it);
  • apply functions to different numbers (otherwise compiler will remove dups);
  • tests must be enclosed into functions (JIT optimizes functions);
  • do not run tests from the browser console (can be much slower than usual);
  • nice to have several runs to get median time.

I got difference about 15-25 times in FF and 40-50 times in Chrome/Node

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>pow7() benchmark</title>
    <script type="text/javascript" defer src="pow7-bench.js"></script>
</head>
<body>
    <h1>Math.pow(), ** and pow7() benchmark</h1>
    <p>Please see console</p>
</body>
</html>
const n  = 1e7;  // number of operations
const x1 =  0;
const x2 = 10;
const dx = (x2 - x1) / n;

function pow7 (x) {
    const x2 = x * x;
    const x7 = x2 * x2 * x2 * x;
    return x7;
}

function testStarStar () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += x ** 7;
        x += dx;
    }
    return accum;
}

function testMathPow () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += Math.pow(x, 7);
        x += dx;
    }
    return accum;
}

function testCustomPow7 () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += pow7(x);
        x += dx;
    }
    return accum;
}

function testMuls () {
    let x = x1;
    let accum = 0;
    while (x < x2) {
        accum += x * x * x * x * x * x * x;
        x += dx;
    }
    return accum;
}

const funcs = [
    testStarStar,
    testMathPow,
    testCustomPow7,
    testMuls
];

const F = funcs.length;  // number of functions
const R = 5;             // number of runs
const accums = Array(F);
const times  = Array(F);

// benchmarks
for (let f = 0; f < F; f++) {
    const func = funcs[f];

    accums[f] = 0;
    times[f] = [];
    
    for (let r = 0; r < R; r++) {
        const t0 = performance.now();
        const result = func();
        const t1 = performance.now();

        accums[f] += result;
        times[f].push(t1 - t0);
    }
}

// print median times
for (let f = 0; f < F; f++) {
    times[f].sort((a, b) => a - b)
    const medianTime = times[f][Math.floor(R / 2)];

    console.log(f);
    console.log(`result: ${accums[f]}`);  // we must use results
    console.log(`times:  ${ Math.round(100 * medianTime) / 100 } ms`);
}

@Myndex
Copy link
Contributor

Myndex commented Oct 22, 2023

WoWwwww....

@dom1n1k Thank you for your patience and the extra info... I am surprised at the number of variations here...

All done on MacOS, intel macbookpro

Here is Safari

Safari bench marking power functions

And here indeed your function is fastest.

But the part that is jaw dropping is Chrome

chrome bench marking power functions

Srsly wut? Here I can see how you had better overall results with the total DE....

Firefox is the slowest

Firefox bench marking power functions

That's not good...

And then Opera, predictably like Chrome:

Opera bench marking power functions

Surprising variation... and whats the deal with the console? just not optimized? because running that script in the console returns very different speeds. This is also true for BBEdit even when running as the webpage...

pasted into console

Thank you again for your extra time, appreciated.

@dom1n1k
Copy link
Contributor Author

dom1n1k commented Oct 22, 2023

and whats the deal with the console? just not optimized?

Highly likely yes. Code in console often goes to different pipeline.

@Myndex
Copy link
Contributor

Myndex commented Oct 23, 2023

interesting that Math.sqrt(x) appears faster than x ** 0.5 on Safari and Firefox, but other way around on Chrome. Results are close enough tho...

@dom1n1k
Copy link
Contributor Author

dom1n1k commented Oct 23, 2023

Normally sqrt(x) much faster than pow(x, 0.5).
This is a special case for which there are more efficient algorithms.

@Myndex
Copy link
Contributor

Myndex commented Oct 23, 2023

Normally sqrt(x) much faster than pow(x, 0.5).
This is a special case for which there are more efficient algorithms.

Interesting, I seem to remember reading something some years ago indicating otherwise, or some ambiguity.

@svgeesus svgeesus merged commit 2154c8d into color-js:main Nov 14, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants