-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making JPEG decoder faster #944
Comments
Making the tables larger isn't free. The returns are rapidly diminishing
(longer codes are rare; after all, that's the whole point of Huffman
coding!) so the wins end up small, while the cost of building a larger
decode table (not a hash table by the way, just a straight array) adds
overhead to every image decode. (Also, if you make them too large, they
don't comfortably fit in the L1 cache anymore, and decode performance tanks
as well.)
An easier way to increase speed is to interleave the passes in a more
fine-grained fashion. The current code batch-decodes DCT blocks for the
whole image and then does the color-space conversion to RGB in a second
pass. This is inefficient on larger images because decoded pixels drop out
of the lower cache levels before they get accessed again. It should be
reasonably easy to modify the code to trigger color-space conversion after
every couple rows (probably one MCU's height) instead.
Also, progressive JPEG decoding is slow right now, and we haven't really
investigated any ways of making it faster.
There are many others ways of making the JPEG decoder faster, and also
different ways of increasing the speed of the Huffman decoder, but all
those add significant complexity which is at odds with the design goals of
stb_image. Feel free to fork the code if you want to make a decoder that
focuses on speed above all else (or look at existing libraries like
libjpeg_turbo for some hints on what you could do), but the main stb_image
code will always favor simplicity over maximum speed.
…On Wed, Apr 1, 2020 at 3:48 AM Photopea ***@***.***> wrote:
I would like to use your JPEG decoder in a webapp. I compiled it to
WebAssembly (13.8 kB .WASM file).
It works well and is quite fast (3x faster than the Javascript decoder
from the pdf.js library).
Do you think it could be made even faster? Sadly, WASM does not support
SIMD instructions, so we should aim for other ways of optimization. What
about reading Huffman codes faster (use different hash tables, to decode
more bits at one step). Do you think it is possible?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#944>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIHB6GHOGFEULO7VCM2TMLRKMLXPANCNFSM4LYYOCNA>
.
|
I should say, that I value the simplicity over the speed. But maybe, it is possible just to rewrite something, to get more speed. Or increase the size of the code only by 10 - 20%, while the speed gains will be much higher. I think libjpeg_turbo is fast because of cpu-specific code (assembly or SIMD instructions), which is not what I want. I implemented a Deflate decoder once, which is much faster than the ZLIB. My decoder requires 64 kB of RAM, while ZLIB works in 32 kB of RAM. And my code is much shorter. Could you spend an hour or two to implement your speed-up ideas? Maybe you can make it 2x faster with 20 extra lines of code. I can donate 100 - 200 USD to this project for that. |
It's not a question of money. There are libraries that target maximum
performance, but stb_image very explicitly does not - see "Philosophy"
section near the top of the file.
Any big changes we will make in the foreseeable future are going to be
concerned with priorities #1 and #2, namely ease of use and ease of
maintenance respectively. In particular, there are a few API mistakes we
want to fix (addressing #1), and a bunch of functionality we want to remove
because it's rarely used but causes a disproportionate amount of
maintenance work (addressing #2). Perf work is a distant third on the
priority list right now, and you might think that increasing the amount of
code by "only 10-20%" is not a big deal in that regard but I respectfully
disagree. We would also be unlikely to take a large patch that added said
functionality because the bulk of the work associated with any change is
ongoing maintenance and as soon as something is in a released version of
stb_image, we're stuck with it.
There's a lot of stb libraries and Sean and me are maintaining them (mostly
Sean). We're both busy and have other hobbies and for better or for worse,
all stb library maintenance work has to fit into something like 5-6
weekends per year because that's how much time we're willing to spend on
it. With a limited time budget, we have to pick our battles, and we decided
a long time ago that we needed to stop the scope creep that had affected
stb_image in particular up to that point, and instead focus on fixing the
bugs we had (and have) because stb_image usage has grown far beyond its
original intended domain (games and throw-away tools) which makes a big
difference in the impact of things like security bugs.
If you want to work on a faster JPEG decoder based on the stb_image one,
feel free. The license explicitly permits you to use the code whichever way
you want. I'm happy to give you pointers on what areas you could improve if
you wish to. However, right now anyway, adding extra complexity to increase
performance is not something we would accept in a pull request for
stb_image, and even if you had a 2x faster decoder we'd be very unlikely to
take that patch if you got it in any way outside of fixing a stupid bug.
In short, even if you had a 2x faster decoder in a pull request right now
and all I had to do was decide whether to merge a 500-line patch or not,
the answer would almost certainly still be "no".
…On Wed, Apr 1, 2020 at 11:50 AM Photopea ***@***.***> wrote:
I should say, that I value the simplicity over the speed. But maybe, it is
possible to rewrite something, to get more speed. Or increase the size of
the code only by 10 - 20%, while the speed gains will be much higher. I
think libjpeg_turbo is fast because of cpu-specific code (assembly or SIMD
instructions), which is not what I want.
I implemented a Deflate decoder once, which is much faster than the ZLIB.
My decoder requires 64 kB of RAM, while ZLIB works in 32 kB of RAM. And my
code is much shorter.
Could you spend an hour or two to implement your speed-up ideas? Maybe you
can make it 2x faster with 20 extra lines of code. I can donate 100 - 200
USD to this project for that.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#944 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIHB6DHQ6ESYGBRGWGA73TRKOEFNANCNFSM4LYYOCNA>
.
|
I was talking about 20 extra lines of code, you are talking about 500 lines, which I never said. I understand your position. I think you are doing a great job and I wish you good luck, keep up the good work! |
I would like to use your JPEG decoder in a webapp. I compiled it to WebAssembly (13.8 kB .WASM file).
It works well and is quite fast (3x faster than the Javascript decoder from the pdf.js library).
Do you think it could be made even faster? Sadly, WASM does not support SIMD instructions, so we should aim for other ways of optimization. What about reading Huffman codes faster (use different hash tables, to decode more bits at one step). Do you think it is possible?
The text was updated successfully, but these errors were encountered: