-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpng: Enable intrinsics on x86/SSE2, ppc64/VSX, and all arm/NEON #78325
Conversation
iOS fails building because of this pretty fishy part of the setup, which tries to use gcc: compiler_path = "$IOS_TOOLCHAIN_PATH/usr/bin/${ios_triple}"
s_compiler_path = "$IOS_TOOLCHAIN_PATH/Developer/usr/bin/"
ccache_path = os.environ.get("CCACHE")
if ccache_path is None:
env["CC"] = compiler_path + "clang"
env["CXX"] = compiler_path + "clang++"
env["S_compiler"] = s_compiler_path + "gcc"
else:
# there aren't any ccache wrappers available for iOS,
# to enable caching we need to prepend the path to the ccache binary
env["CC"] = ccache_path + " " + compiler_path + "clang"
env["CXX"] = ccache_path + " " + compiler_path + "clang++"
env["S_compiler"] = ccache_path + " " + s_compiler_path + "gcc" |
Probably should be: compiler_path = "$IOS_TOOLCHAIN_PATH/usr/bin/${ios_triple}"
ccache_path = os.environ.get("CCACHE")
if ccache_path is None:
env["CC"] = compiler_path + "clang"
env["CXX"] = compiler_path + "clang++"
env["S_compiler"] = compiler_path + "clang"
else:
# there aren't any ccache wrappers available for iOS,
# to enable caching we need to prepend the path to the ccache binary
env["CC"] = ccache_path + " " + compiler_path + "clang"
env["CXX"] = ccache_path + " " + compiler_path + "clang++"
env["S_compiler"] = ccache_path + " " + compiler_path + "clang" |
7a88120
to
cf08958
Compare
To help validate this PR, I've made a test build with the official buildsystem: https://downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325/ Should be compared with 4.1 beta 2 or the upcoming 4.1 beta 3 (it's from a commit in the middle of the two). |
I can't test this out because the source code download isn't available, only binaries and none of the binaries are ppc64le binaries. |
There's a source tarball on the folder I linked, and the source code is also just this PR which you can check out locally. |
I checked it out and tried building it and I get the same error I always have
|
Can you confirm that both files in Does it work without |
I'm not sure about the thirdparty files, here is the output of me ls'ing my
I still get a failure to link with the following error:
|
You're missing the Try with either this tarball: https://downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325/godot-4.1-beta.tar.xz Or the snapshot from my branch: https://github.com/akien-mga/godot/archive/libpng-moar-intrinsics/godot-libpng-moar-intrinsics.zip |
This is a normal error when you run a game template binary without a pck. |
Yeah it's as fire says. So it's working! Thanks for testing. You can build |
Yes this pull request seems to have solved the issue; it worked out of the box. |
I'm not sure if you already know this, but I can only run a project with the Compatibility rendering:
|
Is there a way you could also backport this change for Godot 3.5.2? |
@bkeys Patch releases are completely frozen, the best we could possibly do is put it in the As for your Vulkan errors, it looks like the entire stack trace is inside of the system Vulkan library, not Godot. |
@aaronfranke, well as long as it made it's way into 3 that would be great. |
Yeah I'll probably cherry-pick for 3.6 after confirming that this actually makes things faster, or at least does not regress. The Vulkan crash is unrelated, but you're using Lavapipe (software renderer) which isn't official supported by Godot and has bugs. It might have more bugs on ppc64 as I doubt it's a setup Mesa developers test Vulkan apps on regularly. This should be reported to Mesa upstream. |
cf08958
to
2c9b7fc
Compare
I did a quick test with that build and 4.1-beta3 on Linux x86_64. I'm not clear on how the newly optimized functions are used, but it sounds like it's part of the I didn't see a difference in that crude test, so at least it's not worse :)
|
I did some rudimentary testing on Windows, and I do notice some speed up for one of two images that I've used.
As you can see, there is a lot of deviation between and within the runs. However, the average for the bigger image is consistently lower with this PR. TIWAGOS. Edit: Okay, deviation is largely due to the reported error. Moving the images to
|
Benchmark on Yuri's project after moving the images to
So yeah this PR is 19% faster on this test. |
Had a quick look at a potential |
Cherry-picked for 4.1.2. |
We've been enabling NEON optimizations conditionally for Android arm32 only, but recent libpng versions now default to enabling SSE2 for x86 and VSX for ppc64, so we should do the same. The NEON optimizations should also be relevant for arm64 and other non-Android arm platforms, so I'm compiling the files unconditionally.
This will likely not work out of the box and needs heavy testing on all platforms which support either
x86*
,arm*
andppc64
, notably around the weird hack to compile the.S
file for arm, which apparently didn't work on Windows in the past.Here's the relevant section of
pngpriv.h
(only compiled with.c
files, so we don't need those defines in the main env) which handles checking what's enabled at compile time:https://github.com/godotengine/godot/blob/master/thirdparty/libpng/pngpriv.h#L91-L277
Only
PNG_INTEL_SSE
needs to be passed manually to enable the checks for x86, for other platforms the checks are performed automatically.This should fix PPC64 build as reported in #56448 (CC @bkeys, could you test?).
Finally, if anyone is interested in checking what this actually optimizes, and do some before/after tests to compare performance, that'd be great!
To do checks, you can use this test build made with the official buildsystem: downloads.tuxfamily.org/godotengine/testing/4.1-beta%2bpr78325
Should be compared with 4.1 beta 2 or the upcoming 4.1 beta 3 (it's from a commit in the middle of the two).
Fixes #56448.