Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance detection of MUSL-based Linux systems (xgboost4j) #7915

Closed
snoopcheri opened this issue May 17, 2022 · 7 comments · Fixed by #7921
Closed

Enhance detection of MUSL-based Linux systems (xgboost4j) #7915

snoopcheri opened this issue May 17, 2022 · 7 comments · Fixed by #7921

Comments

@snoopcheri
Copy link
Contributor

Context

With the release 1.6 of xgboost, a feature was added that would automatically detect under Linux, whether it is a regular or a MUSL-based system. Depending on this, the proper Native shared library would be loaded (MUSL-based systems cannot load shared libraries using glibc).

The check whether a Linux system is MUSL-based is done by inspecting memory-mapped files in /proc/self/map_files. This is the same logic as the sqllite-jdbc project is doing here.

Problem

While this typically works well, there are cases, where this directory is not accessible. There are at least two cases:

  • since /proc/self/map_files was added in Linux kernel 3.3, for older kernels this directory simply does not exist
  • some versions of the kernel do not allow access to this directory for non-root users

This problem has also been discussed on the sqlite-jdbc forums. See here for a good summary of the problem.

The current implementation in xgboost assumes, that if this directory cannot be read for some reason, the Linux system is not MUSL-based.

Unfortunately, this assumption is not always correct.

Proposal

We need another way to determine whether a Linux system is MUSL-based or not. It seems, there is no simple auto-detection, that works 100% for all cases and environments. Therefore, I propose that the user can give a hint to xgboost, whether the Linux system is MUSL-based or not.

(1)
In the class NativeLibraryLoader we check for the presence of a custom environment variable like IS_MUSL. If this env variable is present and its value is truthy (true or 1), xgboost assumes the system is MUSL-based, no matter whether memory-mapped files in /proc/self/map_files can be read or not.

(2)
If this custom environment variable is not present, then detection falls back to the current solution.

This could be simplified by defining that you always have to configure the IS_MUSL environment variable, if you're running on a MUSL-based system. But since it seems to work most of the time, it still seems convenient for most users to keep the current detection mechanism.

How does this work for the end user?

  • If you're running xgboost on a regular (glibc-based) Linux system, there's nothing to consider.
  • If you're running xgboost on a MUSL-based Linux system and auto-detection detects this correctly, there's nothing to consider.
  • If you're running xgboost on a MUSL-based Linux system and auto-detection fails to detect this, you'll get an ugly failure when xgboost is trying to load the shared library. The error message should then include a clue, that you may want to use the IS_MUSL environment variable to override this.

Please let me know what you think! I can create a PR with the suggested change.

@trivialfis
Copy link
Member

Thank you for the proposal. @Craigacp would you like to join the discussion when you are available?

@Craigacp
Copy link
Contributor

I think my preference would be to remove the MUSL section from NativeLibraryLoader and supply a single JVM system property override for users which allows them to supply the path to the XGBoost native library on startup, rather than have a two phase approach where the user can override MUSL detection but has a leaky fallback. That's what we do in ONNX Runtime (https://github.com/microsoft/onnxruntime/blob/master/java/src/main/java/ai/onnxruntime/OnnxRuntime.java#L273) to support other use cases as well as this one (e.g. frozen containers which don't have a writable /tmp), and it will allow the library loading logic to be simpler. Users who choose to use a MUSL libc are already compiling things from source so they should be capable of jumping through an additional hoop.

That said, I'll ask around with some Java people to see if there is a way to reliably detect it from within the JVM (e.g. I'm not sure how stable the output of ldd $JAVA_HOME/bin/java would be, but grepping that might work) as if we can remove the instability that way then that's fine.

@Craigacp
Copy link
Contributor

The consensus opinion from the Java people I talked to is that there are only bad ways to detect if you're running on top of musl directly from within the JVM and that those shouldn't be in production code.

@trivialfis
Copy link
Member

Thank you for the comments. Is there a way to fail gracefully?

@snoopcheri
Copy link
Contributor Author

@Craigacp
Thanks for your input! I can see that getting rid of the MUSL auto detection makes sense, as it doesn't work in all cases anyway and simplifies the code.

Concerning your suggestion that the user could specify the path of the shared library to load via an system property:

  • On the positive side, this could be used for other scenarios, we're not yet aware of, as you've mentioned. We would simply use this value, if it is present, no matter what OS or architecture it runs on.
  • On the negative side, this might be suboptimal with respect to security. I cannot come up with a specific scenario, but surely, less things can go wrong, if you only provide a way via a boolean value (like IS_MUSL) as opposed to a full path.

WDYT?

@trivialfis
I'm not sure what you mean with "Is there a way to fail gracefully?" Which scenario do you mean?

@Craigacp
Copy link
Contributor

@snoopcheri WRT to security issues, while I'm not a security person if a bad actor has control over the command line used to run the JVM then there are bigger problems for the system because they could induce any kind of behaviour in the system by altering the classpath.

@snoopcheri
Copy link
Contributor Author

I've created a PR which replaced the auto-detection for MUSL-based Linux systems with system properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants