-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux shared object builds erroneously export 3rd party locally used symbols #114
Comments
I agree that only the Python symbols should be exported from libpython by default. We can probably fix this by adding Since the Python extension modules are statically linked into libpython, they should bind to the OpenSSL, etc symbols appropriately. If a foreign shared library referencing e.g. OpenSSL symbols is loaded, it won't pick up the e.g. OpenSSL symbols in libpython. This will likely fix some issues. However, it can also cause problems since processes may have multiple versions of libraries like OpenSSL running simultaneously. This can introduce hard-to-debug failures - including run-time crashes - especially if the different library versions somehow interact with each other. I can hope this scenario will be rare. But it is bound to occur. |
Yes, that's correct. Sometimes some stupid build systems actually enforce
I think Python can be compiled safely with
Exactly.
I think this move will fix vastly more issues than it can possibly create. Even the scenario you describe should be out of your concerns: if things go well because some code erroneously bind your symbols it's just luck. |
By the way, I almost got everything compiled correctly with |
For many of the packages, we still build shared libraries. The mechanism by which Python statically links things is we simply delete the I know you can tell the linker to prefer a static library over a shared one. But it was easier (and I'm lazy) to just use default build system settings in most places and rip out the shared libraries at the last minute. Having the shared libraries also keeps the door open for more traditional dynamic Python distributions. I always wanted to extend support for the "extension module variants" feature to allow providing both static and shared extension module variants as well as extension module variants not statically linking their library dependencies. So having the dynamic libraries could provide future value. But since none of this extension module variant stuff has materialized yet, I think it is fine to just change the build scripts for libX11 (and anything else) to not produce a shared library if that makes this feature easier to implement. |
I also wonder if making symbols hidden might magically fix #95... |
I updated my initial logs to use the output of |
As a proof of concept, I was able to produce a stripped standalone visibility-hidden-patch-proof-of-concept.txt This build produces the following output with For some reasons, some T (text) section symbols are from openssl are still present. These symbols can be further stripped by appending As said I can confirm stripping unwanted symbols from the standalone |
Also:
I just asked on the availability of |
Thank you for your continued work on this! Everything you say seems reasonable to me. OpenSSL assembly symbol visibility is wonky. We should ship the assembly implementations because those can have performance implications. So I guess that means we must use The only other thing I can think of that may be a factor here is whether libpython still exports the full, non-limited set of symbols (see PEP 384 and PEP 652). I'm pretty sure CPython has macros for defining stable vs limited APIs and these use symbol visibility builtins. We want to be sure that libpython is still exporting non-limited symbols since downstream consumers (like PyOxidizer) rely on these non-limited APIs. |
You're welcome. In the mean time an answer in the darwin-dev mailing list is pointing out that |
I'm hacking on this a bit today. Expect to see some commits referencing this issue appear in the issue timeline soon. After disabling a binary in the I've only implemented this on Linux so far. But I'm optimistic we can get it working for Apple targets easily enough. |
This partially addresses #114. Previously, we were exporting symbols from dependencies like OpenSSL and X11 from our dynamic binaries. This can cause problems when other libraries are loaded into the same address space and attempt to bind to the symbols being exported by Python. This commit changes things such that symbols for Python dependencies are now hidden and can no longer be bound to. Python's own symbols are still exported of course, as Python extensions (and the `python` executable) need to call into them. This change is only implemented on Linux for the moment. Apple targets require a bit more effort to make work. As part of this, we update validation to confirm visibility of certain symbols in ELF binaries. The new validation logic correctly rejects distributions built before this change.
Getting this to work on macOS is going to entail a bit more effort. I think it is doable by using It's really too bad open source Clang/lld doesn't seem to have an easy-to-use option for influencing visibility on macOS. (Although All that being said, I don't believe symbol visibility matters on modern Apple platforms that much. I'm far from an expert here so please call me out if I'm wrong, but I believe modern Mach-O always uses two-level namespaces and two-level namespaces require that every undefined symbol be associated with a lookup hint. Practically speaking, every undefined symbol is annotated with the library that provides it. More info at https://github.com/aidansteele/osx-abi-macho-file-format-reference#twolevel_hints_command. So, if you have two Mach-O libraries loaded into the same process, they can both define overlapping symbols but due to the library annotations the loader will resolve symbols to the appropriate library. I believe the only scenario where the presence of unwanted global symbols in our distributions can cause problems is when our libpython (or its object files) are linked (not loaded) against other object files / libraries. In this scenario, our symbols may get used. There are pros and cons to this. Anyway, I think I talked myself into not doing anything with the Apple distributions for the moment. I remain receptive to changing my mind on this if someone can convince me the visible symbols on macOS are actually a problem. |
This partially addresses #114. Previously, we were exporting symbols from dependencies like OpenSSL and X11 from our dynamic binaries. This can cause problems when other libraries are loaded into the same address space and attempt to bind to the symbols being exported by Python. This commit changes things such that symbols for Python dependencies are now hidden and can no longer be bound to. Python's own symbols are still exported of course, as Python extensions (and the `python` executable) need to call into them. This change is only implemented on Linux for the moment. Apple targets require a bit more effort to make work. As part of this, we update validation to confirm visibility of certain symbols in ELF binaries. The new validation logic correctly rejects distributions built before this change.
See inline comment for rationale. Related to #114.
I don't know exactly what linker you're using for mac, I would be curious to know. I think lldb, which should be the official llvm suite linker, supports |
We use the
So I guess we could adopt |
This partially addresses #114. Previously, we were exporting symbols from dependencies like OpenSSL and X11 from our dynamic binaries. This can cause problems when other libraries are loaded into the same address space and attempt to bind to the symbols being exported by Python. This commit changes things such that symbols for Python dependencies are now hidden and can no longer be bound to. Python's own symbols are still exported of course, as Python extensions (and the `python` executable) need to call into them. This change is only implemented on Linux for the moment. Apple targets require a bit more effort to make work. As part of this, we update validation to confirm visibility of certain symbols in ELF binaries. The new validation logic correctly rejects distributions built before this change.
See inline comment for rationale. Related to #114.
This commit follows up on recent work we did for Linux/ELF around not exporting dependency symbols and applies it to Apple/Mach-O. We change compiler flags so symbols are hidden by default. We also teach the linker to make symbols in library dependencies hidden by default to prevent them from being exported / dynamic in the resulting Mach-O binary. Mach-O linker support for controlling symbol visibility is different from ELF. We had to utilize an `ld64` specific argument (`-hidden-l<lib>`) to force symbol visibility on a per library basis. The Rust validation code for Mach-O has been updated to verify we aren't exporting dependency symbols and are properly exporting Python symbols, just like we did for ELF. Distributions built before this fail the new validation. This should finish the implementation of #114. Windows is still not addressed. We can file a separate issue to track Windows.
This partially addresses #114. Previously, we were exporting symbols from dependencies like OpenSSL and X11 from our dynamic binaries. This can cause problems when other libraries are loaded into the same address space and attempt to bind to the symbols being exported by Python. This commit changes things such that symbols for Python dependencies are now hidden and can no longer be bound to. Python's own symbols are still exported of course, as Python extensions (and the `python` executable) need to call into them. This change is only implemented on Linux for the moment. Apple targets require a bit more effort to make work. As part of this, we update validation to confirm visibility of certain symbols in ELF binaries. The new validation logic correctly rejects distributions built before this change.
See inline comment for rationale. Related to #114.
This commit follows up on recent work we did for Linux/ELF around not exporting dependency symbols and applies it to Apple/Mach-O. We change compiler flags so symbols are hidden by default. We also teach the linker to make symbols in library dependencies hidden by default to prevent them from being exported / dynamic in the resulting Mach-O binary. Mach-O linker support for controlling symbol visibility is different from ELF. We had to utilize an `ld64` specific argument (`-hidden-l<lib>`) to force symbol visibility on a per library basis. The Rust validation code for Mach-O has been updated to verify we aren't exporting dependency symbols and are properly exporting Python symbols, just like we did for ELF. Distributions built before this fail the new validation. This should finish the implementation of #114. Windows is still not addressed. We can file a separate issue to track Windows.
OK. I think I got Apple/Mach-O symbols hidden now as well. Commit is up on CI. If that passes, I'll likely push to main. I added the same validation logic for Mach-O as I implemented for ELF. So I'm reasonably certain that behavior is consistent between the platforms. Thank you for all your help on this! You definitely saved me a bit of work! |
And pushed commits to |
You're welcome and thank you so much for the great work with this project which was exactly what I was looking for in a solution that uses python embedding! |
I noticed linux .so builds export tons of 3rd party symbols, such as openssl, tk, and others. This can be verified by downloading one official build, such as[1], and run the following command:
This produces the following output: standalone-libpython3.9-nm.txt . Here is an excerpt:
As you can see licit Python API symbols are exported together with openssl symbols. Look by comparison the output produced by the same command on a
libpython3.8.so
as packaged in a ubuntu system: ubuntu-libpython3.8-nm.txt . Here mostly official python API symbols are exported, and no trace of openssl/tk ones.The behavior as shown in python standalone builds is problematic as this may cause software dynamically loading symbols to erroneously bind to the standalone libypthon exported ones, incurring in ABI incompatibilities and crashes, making your builds unusable for me. To play fair, the standalone
libpython3.9.so
should export the minimal set of needed symbols, such as the python API ones, and not export any of the 3rd party symbols that are used locally inside it. It's strange this is not just happening in your builds, since at some point you probably rely on the official python build infrastructure which I believe it should already take care of this. Nevertheless this is usually achieved in different ways:-fvisibility=hidden
on the compiler options. Then selectively all the public symbols are exported: this is done by decorating the symbols with__attribute__ ((visibility ("default")))
. The infrastructure to selectively export only public API symbols seems to be already set up in the official python source, see the expansion of thePyAPI_FUNC
andPy_EXPORTED_SYMBOL
macros;-Wl,--exclude-libs,ALL
[2]. This linker options is not available in the apple linker, so I don't recommend this strategy.I ask you support where I could experiment enforcing the
-fvisibility=hidden
compilation flag when the finallibpython3.9.so
is built, which I believe it could just the fix for the problem I am observing, and apply the same on your tree when this verified to produce a working binary that is exporting only the required symbols.[1] https://github.com/indygreg/python-build-standalone/releases/download/20211017/cpython-3.9.7-x86_64-unknown-linux-gnu-lto-20211017T1616.tar.zst
[2] https://stackoverflow.com/a/14863432/213871
The text was updated successfully, but these errors were encountered: