Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpng: Enable intrinsics on x86/SSE2, ppc64/VSX, and all arm/NEON #78325

Merged
merged 1 commit into from
Aug 4, 2023

Conversation

akien-mga
Copy link
Member

@akien-mga akien-mga commented Jun 16, 2023

We've been enabling NEON optimizations conditionally for Android arm32 only, but recent libpng versions now default to enabling SSE2 for x86 and VSX for ppc64, so we should do the same. The NEON optimizations should also be relevant for arm64 and other non-Android arm platforms, so I'm compiling the files unconditionally.

This will likely not work out of the box and needs heavy testing on all platforms which support either x86*, arm* and ppc64, notably around the weird hack to compile the .S file for arm, which apparently didn't work on Windows in the past.

Here's the relevant section of pngpriv.h (only compiled with .c files, so we don't need those defines in the main env) which handles checking what's enabled at compile time:
https://github.com/godotengine/godot/blob/master/thirdparty/libpng/pngpriv.h#L91-L277

Only PNG_INTEL_SSE needs to be passed manually to enable the checks for x86, for other platforms the checks are performed automatically.

This should fix PPC64 build as reported in #56448 (CC @bkeys, could you test?).

Finally, if anyone is interested in checking what this actually optimizes, and do some before/after tests to compare performance, that'd be great!

To do checks, you can use this test build made with the official buildsystem: downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325

Should be compared with 4.1 beta 2 or the upcoming 4.1 beta 3 (it's from a commit in the middle of the two).

Fixes #56448.

drivers/png/SCsub Outdated Show resolved Hide resolved
@akien-mga akien-mga changed the title libpng: Enables intrinsics on x86 (SSE2), ppc64 (VSX), and all arm (NEON) libpng: Enable intrinsics on x86 (SSE2), ppc64 (VSX), and all arm (NEON) Jun 16, 2023
@akien-mga akien-mga changed the title libpng: Enable intrinsics on x86 (SSE2), ppc64 (VSX), and all arm (NEON) libpng: Enable intrinsics on x86/SSE2, ppc64/VSX, and all arm/NEON Jun 16, 2023
@akien-mga
Copy link
Member Author

iOS fails building because of this pretty fishy part of the setup, which tries to use gcc:

    compiler_path = "$IOS_TOOLCHAIN_PATH/usr/bin/${ios_triple}"
    s_compiler_path = "$IOS_TOOLCHAIN_PATH/Developer/usr/bin/"

    ccache_path = os.environ.get("CCACHE")
    if ccache_path is None:
        env["CC"] = compiler_path + "clang"
        env["CXX"] = compiler_path + "clang++"
        env["S_compiler"] = s_compiler_path + "gcc"
    else:
        # there aren't any ccache wrappers available for iOS,
        # to enable caching we need to prepend the path to the ccache binary
        env["CC"] = ccache_path + " " + compiler_path + "clang"
        env["CXX"] = ccache_path + " " + compiler_path + "clang++"
        env["S_compiler"] = ccache_path + " " + s_compiler_path + "gcc"

@bruvzg
Copy link
Member

bruvzg commented Jun 16, 2023

iOS fails building because of this pretty fishy part of the setup, which tries to use gcc:

Probably should be:

    compiler_path = "$IOS_TOOLCHAIN_PATH/usr/bin/${ios_triple}"

    ccache_path = os.environ.get("CCACHE")
    if ccache_path is None:
        env["CC"] = compiler_path + "clang"
        env["CXX"] = compiler_path + "clang++"
        env["S_compiler"] = compiler_path + "clang"
    else:
        # there aren't any ccache wrappers available for iOS,
        # to enable caching we need to prepend the path to the ccache binary
        env["CC"] = ccache_path + " " + compiler_path + "clang"
        env["CXX"] = ccache_path + " " + compiler_path + "clang++"
        env["S_compiler"] = ccache_path + " " + compiler_path + "clang"

@akien-mga akien-mga force-pushed the libpng-moar-intrinsics branch from 7a88120 to cf08958 Compare June 16, 2023 11:47
@akien-mga akien-mga marked this pull request as ready for review June 19, 2023 14:39
@akien-mga akien-mga requested review from a team as code owners June 19, 2023 14:39
@akien-mga akien-mga modified the milestones: 4.x, 4.2 Jun 19, 2023
@akien-mga
Copy link
Member Author

To help validate this PR, I've made a test build with the official buildsystem: https://downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325/

Should be compared with 4.1 beta 2 or the upcoming 4.1 beta 3 (it's from a commit in the middle of the two).

@bkeys
Copy link
Contributor

bkeys commented Jun 26, 2023

I can't test this out because the source code download isn't available, only binaries and none of the binaries are ppc64le binaries.

@akien-mga
Copy link
Member Author

There's a source tarball on the folder I linked, and the source code is also just this PR which you can check out locally.

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

I checked it out and tried building it and I get the same error I always have
I ran scons platform=linuxbsd use_llvm=yes target=template_release production=yes
And the build failed at the link stage:

[Initial build] Linking Program bin/godot.linuxbsd.template_release.ppc64.llvm ...
/usr/bin/ld: /tmp/lto-llvm-f0f753.o: in function `png_read_row':
ld-temp.o:(.text.png_read_row+0x19c): undefined reference to `png_init_filter_functions_vsx'
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
scons: *** [bin/godot.linuxbsd.template_release.ppc64.llvm] Error 1
scons: building terminated because of errors.

@akien-mga
Copy link
Member Author

Can you confirm that both files in thirdparty/libpng/powerpc got compiled? (i.e. they have a matching .ppc64.o file)

Does it work without production=yes (i.e. with enabling LTO)? To test properly you'd need to clean past build artifacts to make sure it's not wrongly reusing LTO object files.

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

I'm not sure about the thirdparty files, here is the output of me ls'ing my thirdparty/libpng directory:

arm                                              pngrio.c
LICENSE                                          pngrio.linuxbsd.template_release.ppc64.llvm.o
png.c                                            pngrtran.c
pngconf.h                                        pngrtran.linuxbsd.template_release.ppc64.llvm.o
pngdebug.h                                       pngrutil.c
pngerror.c                                       pngrutil.linuxbsd.template_release.ppc64.llvm.o
pngerror.linuxbsd.template_release.ppc64.llvm.o  pngset.c
pngget.c                                         pngset.linuxbsd.template_release.ppc64.llvm.o
pngget.linuxbsd.template_release.ppc64.llvm.o    pngstruct.h
png.h                                            pngtrans.c
pnginfo.h                                        pngtrans.linuxbsd.template_release.ppc64.llvm.o
pnglibconf.h                                     pngwio.c
png.linuxbsd.template_release.ppc64.llvm.o       pngwio.linuxbsd.template_release.ppc64.llvm.o
pngmem.c                                         pngwrite.c
pngmem.linuxbsd.template_release.ppc64.llvm.o    pngwrite.linuxbsd.template_release.ppc64.llvm.o
pngpread.c                                       pngwtran.c
pngpread.linuxbsd.template_release.ppc64.llvm.o  pngwtran.linuxbsd.template_release.ppc64.llvm.o
pngpriv.h                                        pngwutil.c
pngread.c                                        pngwutil.linuxbsd.template_release.ppc64.llvm.o
pngread.linuxbsd.template_release.ppc64.llvm.o

I still get a failure to link with the following error:

[Initial build] Linking Program bin/godot.linuxbsd.template_release.ppc64.llvm ...
/usr/bin/ld: drivers/libdrivers.linuxbsd.template_release.ppc64.llvm.a(pngrutil.linuxbsd.template_release.ppc64.llvm.o): in function `png_read_filter_row':
pngrutil.c:(.text+0x6150): undefined reference to `png_init_filter_functions_vsx'
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
scons: *** [bin/godot.linuxbsd.template_release.ppc64.llvm] Error 1
scons: building terminated because of errors.

@akien-mga
Copy link
Member Author

You're missing the thirdparty/libpng/powerpc directory added by this PR, so you seem to be compiling the wrong branch.

Try with either this tarball: https://downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325/godot-4.1-beta.tar.xz

Or the snapshot from my branch: https://github.com/akien-mga/godot/archive/libpng-moar-intrinsics/godot-libpng-moar-intrinsics.zip

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

2023-06-27_16-00
It built but now I get this runtime error.

@fire
Copy link
Member

fire commented Jun 27, 2023

This is a normal error when you run a game template binary without a pck.

@akien-mga
Copy link
Member Author

Yeah it's as fire says. So it's working! Thanks for testing.

You can build target=editor if you want to test the editor build.

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

Yes this pull request seems to have solved the issue; it worked out of the box.

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

I'm not sure if you already know this, but I can only run a project with the Compatibility rendering:

[bkeys@desktop godot-libpng-moar-intrinsics]$ Vulkan API 1.3.246 - Forward Mobile - Using Vulkan Device #0: Unknown - llvmpipe (LLVM 16.0.4, 128 bits)
 
WARNING: Blend file import is enabled in the project settings, but no Blender path is configured in the editor settings. Blend files will not be imported.
     at: _editor_init (modules/gltf/register_types.cpp:73)
ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
   at: propagate_notification (scene/main/node.cpp:2207)

================================================================
handle_crash: Program crashed with signal 11
Engine version: Godot Engine v4.1.beta.custom_build
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] linux-vdso64.so.1(__kernel_sigtramp_rt64+0) [0x7fff9b140464] (??:0)
[2] llvm::CmpInst::Create(llvm::Instruction::OtherOps, llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::BasicBlock*) (??:0)
[3] /lib64/libLLVM-16.so(+0x1f01ecc) [0x7fff7cf01ecc] (??:0)
[4] /lib64/libLLVM-16.so(+0x1f01fcc) [0x7fff7cf01fcc] (??:0)
[5] /lib64/libLLVM-16.so(+0x1e8db74) [0x7fff7ce8db74] (??:0)
[6] /lib64/libLLVM-16.so(+0x1e9406c) [0x7fff7ce9406c] (??:0)
[7] llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (??:0)
[8] /lib64/libLLVM-16.so(+0x4939cfc) [0x7fff7f939cfc] (??:0)
[9] llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (??:0)
[10] /lib64/libLLVM-16.so(+0x36f389c) [0x7fff7e6f389c] (??:0)
[11] llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (??:0)
[12] /lib64/libLLVM-16.so(+0x36f353c) [0x7fff7e6f353c] (??:0)
[13] llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (??:0)
[14] /lib64/libLLVM-16.so(LLVMRunPasses+0x37c) [0x7fff7f9741bc] (??:0)
[15] /usr/lib64/libvulkan_lvp.so(+0x2fa620) [0x7fff830fa620] (??:0)
[16] /usr/lib64/libvulkan_lvp.so(+0x470afc) [0x7fff83270afc] (??:0)
[17] /usr/lib64/libvulkan_lvp.so(+0x475ab0) [0x7fff83275ab0] (??:0)
[18] /usr/lib64/libvulkan_lvp.so(+0x3f8f98) [0x7fff831f8f98] (??:0)
[19] /usr/lib64/libvulkan_lvp.so(+0x3f121c) [0x7fff831f121c] (??:0)
[20] /usr/lib64/libvulkan_lvp.so(+0x3f16d0) [0x7fff831f16d0] (??:0)
[21] /usr/lib64/libvulkan_lvp.so(+0x3f1de8) [0x7fff831f1de8] (??:0)
[22] /usr/lib64/libvulkan_lvp.so(+0x374c80) [0x7fff83174c80] (??:0)
[23] /usr/lib64/libvulkan_lvp.so(+0x2bb330) [0x7fff830bb330] (??:0)
[24] /usr/lib64/libvulkan_lvp.so(+0x2bf4ac) [0x7fff830bf4ac] (??:0)
[25] /usr/lib64/libvulkan_lvp.so(+0x2acc5c) [0x7fff830acc5c] (??:0)
[26] /usr/lib64/libvulkan_lvp.so(+0x10a9c8) [0x7fff82f0a9c8] (??:0)
[27] /usr/lib64/libvulkan_lvp.so(+0x10addc) [0x7fff82f0addc] (??:0)
[28] /usr/lib64/libvulkan_lvp.so(+0xf6de0) [0x7fff82ef6de0] (??:0)
[29] /lib64/libc.so.6(+0xb7c64) [0x7fff9acb7c64] (??:0)
-- END OF BACKTRACE --
================================================================
Godot Engine v4.1.beta.custom_build - https://godotengine.org
OpenGL API 4.5 (Core Profile) Mesa 23.1.2 - Compatibility - Using Device: Mesa - AMD TURKS (DRM 2.50.0 / 6.2.9-300.fc38.ppc64le, LLVM 16.0.5)
 
ERROR: Transient parent has another exclusive child.
   at: set_visible (scene/main/window.cpp:806)
Editing project: /home/bkeys/Devel/Sandbox
Godot Engine v4.1.beta.custom_build - https://godotengine.org
Vulkan API 1.3.246 - Forward+ - Using Vulkan Device #0: Unknown - llvmpipe (LLVM 16.0.4, 128 bits)
[bkeys@desktop godot-libpng-moar-intrinsics]$  
WARNING: Occlusion culling is disabled at build-time.
     at: _print_warning (./servers/rendering/renderer_scene_occlusion_cull.h:165)
WARNING: Blend file import is enabled in the project settings, but no Blender path is configured in the editor settings. Blend files will not be imported.
     at: _editor_init (modules/gltf/register_types.cpp:73)
ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
   at: propagate_notification (scene/main/node.cpp:2207)

================================================================
handle_crash: Program crashed with signal 11
Engine version: Godot Engine v4.1.beta.custom_build
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] linux-vdso64.so.1(__kernel_sigtramp_rt64+0) [0x7fffa2000464] (??:0)
[2] llvm::CmpInst::Create(llvm::Instruction::OtherOps, llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::BasicBlock*) (??:0)
[3] /lib64/libLLVM-16.so(+0x1f01ecc) [0x7fff83d01ecc] (??:0)
[4] /lib64/libLLVM-16.so(+0x1f01fcc) [0x7fff83d01fcc] (??:0)
[5] /lib64/libLLVM-16.so(+0x1e8db74) [0x7fff83c8db74] (??:0)
[6] /lib64/libLLVM-16.so(+0x1e9406c) [0x7fff83c9406c] (??:0)
[7] llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (??:0)
[8] /lib64/libLLVM-16.so(+0x4939cfc) [0x7fff86739cfc] (??:0)
[9] llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (??:0)
[10] /lib64/libLLVM-16.so(+0x36f389c) [0x7fff854f389c] (??:0)
[11] llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (??:0)
[12] /lib64/libLLVM-16.so(+0x36f353c) [0x7fff854f353c] (??:0)
[13] llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (??:0)
[14] /lib64/libLLVM-16.so(LLVMRunPasses+0x37c) [0x7fff867741bc] (??:0)
[15] /usr/lib64/libvulkan_lvp.so(+0x2fa620) [0x7fff89efa620] (??:0)
[16] /usr/lib64/libvulkan_lvp.so(+0x3a8aa4) [0x7fff89fa8aa4] (??:0)
[17] /usr/lib64/libvulkan_lvp.so(+0x3982e8) [0x7fff89f982e8] (??:0)
[18] /usr/lib64/libvulkan_lvp.so(+0x374e28) [0x7fff89f74e28] (??:0)
[19] /usr/lib64/libvulkan_lvp.so(+0x2bb234) [0x7fff89ebb234] (??:0)
[20] /usr/lib64/libvulkan_lvp.so(+0x2bf4ac) [0x7fff89ebf4ac] (??:0)
[21] /usr/lib64/libvulkan_lvp.so(+0x2acc5c) [0x7fff89eacc5c] (??:0)
[22] /usr/lib64/libvulkan_lvp.so(+0x10a9c8) [0x7fff89d0a9c8] (??:0)
[23] /usr/lib64/libvulkan_lvp.so(+0x10addc) [0x7fff89d0addc] (??:0)
[24] /usr/lib64/libvulkan_lvp.so(+0xf6de0) [0x7fff89cf6de0] (??:0)
[25] /lib64/libc.so.6(+0xb7c64) [0x7fffa1ab7c64] (??:0)
-- END OF BACKTRACE --
================================================================

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

Is there a way you could also backport this change for Godot 3.5.2?

@aaronfranke
Copy link
Member

aaronfranke commented Jun 27, 2023

@bkeys Patch releases are completely frozen, the best we could possibly do is put it in the 3.5 branch for 3.5.3. But that would only happen if it was first backported to 3.x which corresponds to the upcoming 3.6 release.

As for your Vulkan errors, it looks like the entire stack trace is inside of the system Vulkan library, not Godot.

@bkeys
Copy link
Contributor

bkeys commented Jun 27, 2023

@aaronfranke, well as long as it made it's way into 3 that would be great.

@akien-mga akien-mga added cherrypick:3.x Considered for cherry-picking into a future 3.x release cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release labels Jun 28, 2023
@akien-mga
Copy link
Member Author

akien-mga commented Jun 28, 2023

Yeah I'll probably cherry-pick for 3.6 after confirming that this actually makes things faster, or at least does not regress.

The Vulkan crash is unrelated, but you're using Lavapipe (software renderer) which isn't official supported by Godot and has bugs. It might have more bugs on ppc64 as I doubt it's a setup Mesa developers test Vulkan apps on regularly. This should be reported to Mesa upstream.

@akien-mga akien-mga requested a review from bruvzg August 4, 2023 12:55
@akien-mga akien-mga force-pushed the libpng-moar-intrinsics branch from cf08958 to 2c9b7fc Compare August 4, 2023 12:57
@akien-mga
Copy link
Member Author

To help validate this PR, I've made a test build with the official buildsystem: downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325

Should be compared with 4.1 beta 2 or the upcoming 4.1 beta 3 (it's from a commit in the middle of the two).

I did a quick test with that build and 4.1-beta3 on Linux x86_64.

I'm not clear on how the newly optimized functions are used, but it sounds like it's part of the png_read_* methods which I'm assuming our code must be using somehow when loading PNGs. So I did a quick test with a project with ~80 PNG files, comparing the time it takes to open Godot and import PNGs, then quit.

I didn't see a difference in that crude test, so at least it's not worse :)
I'm not sure I'm actually testing the PNG filter stuff this way.

$ hyperfine -i -w3 -m10 -p 'rm -rf .godot/' 'Godot_v4.1-beta3_linux.x86_64 -e --quit-after 1' 'Godot_v4.1-beta+pr78325_linux.x86_64 -e --quit-after 1'
Benchmark 1: Godot_v4.1-beta3_linux.x86_64 -e --quit-after 1
  Time (mean ± σ):     14.839 s ±  0.510 s    [User: 68.086 s, System: 3.322 s]
  Range (min … max):   14.043 s … 15.380 s    10 runs
 
Benchmark 2: Godot_v4.1-beta+pr78325_linux.x86_64 -e --quit-after 1
  Time (mean ± σ):     14.828 s ±  0.499 s    [User: 66.846 s, System: 3.220 s]
  Range (min … max):   14.039 s … 15.363 s    10 runs
 
Summary
  Godot_v4.1-beta+pr78325_linux.x86_64 -e --quit-after 1 ran
    1.00 ± 0.05 times faster than Godot_v4.1-beta3_linux.x86_64 -e --quit-after 1

@YuriSizov
Copy link
Contributor

YuriSizov commented Aug 4, 2023

I did some rudimentary testing on Windows, and I do notice some speed up for one of two images that I've used.

reading-pngs.zip

Run S min S max S avg B min B max B avg
beta3 - 1 6273 271146 13764 24133 450469 31811
beta3 - 2 4120 172782 13249 22389 272221 29661
beta3 - 3 3790 148636 12567 22258 1147399 30341
This PR - 1 6319 98875 13016 22325 176818 27736
This PR - 2 5799 49381 11377 19772 217773 27515
This PR - 3 4731 368324 13412 21649 259804 28197
beta3 - 4 5160 200674 13741 23344 441516 30173
beta3 - 5 4274 202503 13027 19626 231817 29739
beta3 - 6 7549 172504 12781 23224 267935 28979
This PR - 5 5698 348105 14465 20706 233835 28683
This PR - 6 6631 125481 13148 22765 194569 27649
This PR - 7 6661 229066 12417 21523 65002 26079

As you can see, there is a lot of deviation between and within the runs. However, the average for the bigger image is consistently lower with this PR. TIWAGOS.

Edit: Okay, deviation is largely due to the reported error. Moving the images to user:// reduces the times and the deviation dramatically. There is still a noticeable improvement in times for the bigger image:

Run S min S max S avg B min B max B avg
This PR 588 2018 1109 14201 18903 14961
This PR 566 2275 1537 14003 21807 14556
This PR 547 2250 1314 13991 16842 14371
beta3 593 3140 1549 15229 17181 15636
beta3 625 2249 1476 15354 34255 15770
beta3 593 2169 1508 15315 19392 15780

@akien-mga
Copy link
Member Author

Benchmark on Yuri's project after moving the images to user://, on Linux (added a get_tree().quit() so I can run the test and benchmark the full duration):

$ hyperfine -i -w1 -m3 'Godot_v4.1-beta3_linux.x86_64' 'Godot_v4.1-beta+pr78325_linux.x86_64'
Benchmark 1: Godot_v4.1-beta3_linux.x86_64
  Time (mean ± σ):     51.052 s ±  0.584 s    [User: 37.870 s, System: 1.995 s]
  Range (min … max):   50.378 s … 51.391 s    3 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: Godot_v4.1-beta+pr78325_linux.x86_64
  Time (mean ± σ):     42.846 s ±  0.642 s    [User: 29.330 s, System: 1.951 s]
  Range (min … max):   42.106 s … 43.251 s    3 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  Godot_v4.1-beta+pr78325_linux.x86_64 ran
    1.19 ± 0.02 times faster than Godot_v4.1-beta3_linux.x86_64

So yeah this PR is 19% faster on this test.

@akien-mga akien-mga merged commit cc6a609 into godotengine:master Aug 4, 2023
@akien-mga akien-mga deleted the libpng-moar-intrinsics branch August 4, 2023 15:04
@akien-mga
Copy link
Member Author

Had a quick look at a potential 3.x cherrypick, the lack of unified env["arch"] in the 3.x branch makes this pretty difficult to do reliably, so I'll skip it for now.

@akien-mga akien-mga removed the cherrypick:3.x Considered for cherry-picking into a future 3.x release label Aug 29, 2023
@YuriSizov YuriSizov removed cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release needs testing labels Aug 31, 2023
@YuriSizov
Copy link
Contributor

Cherry-picked for 4.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot build for ppc64le
6 participants