stb resize 2.08 #1649

jeffrbig2 · 2024-06-13T00:47:05Z

fix for RGB->BGR three channel flips and add SIMD (thanks to Ryan Salsbury)
fix for sub-rect resizes
use pragmas to control unrolling when they are available.

ryanrsrs · 2024-06-13T17:21:14Z

I test this change on my Raspberry Pi 4B running Raspberry OS in 32-bit mode:
$ uname -a
Linux raspberrypi 6.6.31+rpt-rpi-v7l #1 SMP Raspbian 1:6.6.31-1+rpt1 (2024-05-29) armv7l GNU/Linux

The color bug I noticed in stbir__simple_flip_3ch() is fixed, in both scalar and simd paths. On my platform, stbir__simdf_swiz2 is not defined and it selects the second SIMD code block, using stbir__simdf_swiz().

The change in speed from enabling SIMD is slight (but consistent). I have verified which code paths are executing using printfs.

With gcc, SIMD gave a 15% speedup.
With clang, SIMD gave a 3% slowdown.

GCC build options:
cc -std=gnu11 -Wall -I/usr/include/libdrm -Os -march=native -DSTBIR_USE_FMA -mfpu=neon-vfpv4 -mfp16-format=ieee -Wno-unused-function -c stb_impl.c

Clang build options:
clang -std=gnu11 -Wall -I/usr/include/libdrm -Os -march=native -DSTBIR_USE_FMA -mfpu=neon-vfpv4 -Wno-unused-function -c stb_impl.c

The fastest version, Clang with -DSTBIR_NO_SIMD (lol), performs as follow:
src: 6048 x 8064
dst: 900 x 1200
time: 1.003 seconds

I'm not sure why it's so slow since it's only 150 MB of pixels. Maybe the long scanlines are thrashing the cache in a maximally-bad way?

The speed is fine for my application, and matches the 2.07 non-SIMD speed, so I dunno if there's a problem. But if you expected a bigger difference on this platform, I can poke at it some more.

e: All times mentioned above are for the call to stbir_resize_extended(), which does much more work than just flip_3ch(). But even the core resizer math doesn't speed up with SIMD, really? Maybe I am doing something wrong here.

e2: just rechecked 2.08 times against 2.07, both scalar and SIMD. They're the same. So this does not seem like a regression, just something I noticed now, since I am comparing simd and not-simd back-to-back to see that the color was fixed in both.

jeffrbig2 · 2024-06-13T22:39:25Z

That's a reasonably big downsample (depending on your filter) - 1 second doesn't seem nuts for a 32-bit platform that is reading 150 MB of input with a sample window of 27x20 (each output pixel has to read 27x20 of the input). 32-bit vs 64-bit is a huge hit here, btw. There are a couple things you can do:

throw threads at it - this is a linear speed up - 2x cores, half the time.
use linear pixel format - STBIR_TYPE_UINT8 instead of STBIR_TYPE_UINT8_SRGB
don't use wrap edge mode
use a simpler filter, STBIR_FILTER_BOX or STBIR_FILTER_TRIANGLE.
to make better cache use, break the resize into vertical stripes (use the stbir_set_pixel_subrect function to do 128 vertical output pixels at a time). This will usually save 25% to 50%.

For option 5, you can also wait for 2.09 which will internally do the cache striping for you.

But yeah, 32-bit arm is just pretty darn pokey in general.

ryanrsrs · 2024-06-13T22:48:12Z

Yep, I'm not complaining about the performnace, I just wanted to be check the numbers seemed sensible.

The application I'm testing is decode and display of 45MP iPhone 15 heic files on a Rasp Pi Zero 2 W 512MB. (It works!)

jeffrbig2 · 2024-06-13T22:58:13Z

There's probably some more wins if you want to get fancy. Instead of decoding the HEIC into RGB and then resizing that, decode into YUV (where the U and V planes are smaller), resize those planes, and THEN convert to RGB in the smaller space.

nothings · 2024-10-16T21:54:50Z

This was already merged in later updates.

jeffatrad and others added 2 commits June 12, 2024 17:43

stb resize 2.08

75d7fba

Update stb_image_resize2.h

ff9fef2

jeffatrad and others added 2 commits June 19, 2024 10:07

resize 2.09 - fix arm defines for gcc

c671203

Merge branch 'nothings:master' into resize-2.07

07a180a

nothings closed this Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stb resize 2.08 #1649

stb resize 2.08 #1649

jeffrbig2 commented Jun 13, 2024

ryanrsrs commented Jun 13, 2024 •

edited

Loading

jeffrbig2 commented Jun 13, 2024

ryanrsrs commented Jun 13, 2024

jeffrbig2 commented Jun 13, 2024

nothings commented Oct 16, 2024

stb resize 2.08 #1649

stb resize 2.08 #1649

Conversation

jeffrbig2 commented Jun 13, 2024

ryanrsrs commented Jun 13, 2024 • edited Loading

jeffrbig2 commented Jun 13, 2024

ryanrsrs commented Jun 13, 2024

jeffrbig2 commented Jun 13, 2024

nothings commented Oct 16, 2024

ryanrsrs commented Jun 13, 2024 •

edited

Loading