Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl_global_init footgun #333

Closed
sagebind opened this issue May 23, 2020 · 9 comments
Closed

curl_global_init footgun #333

sagebind opened this issue May 23, 2020 · 9 comments

Comments

@sagebind
Copy link
Collaborator

I've decided to open this issue here for discussion.

An Isahc user brought to my attention this week a double-free error occurring in a relatively simple program (issue sagebind/isahc#189), which I managed to simplify even further to a program like this:

fn main() {
    let threads = 8;
    let mut handles = Vec::new();

    for _ in 0..threads {
        handles.push(std::thread::spawn(move || {
            let mut easy = curl::easy::Easy::new();
            easy.url("http://localhost:8080").unwrap();
            easy.perform().unwrap();
        }));
    }

    for handle in handles  {
        handle.join().unwrap();
    }
}

Depending on which versions of curl is used and which SSL engine, the above program can randomly crash with a SIGABRT on a double-free error; in particular, this is with libcurl 7.54.0 and LibreSSL 2.6.5, both system defaults on macOS 10.14 (and likely other versions as well), so the chance of using these versions is pretty high on that platform.

Obtaining a backtrace when such a crash occurs reveals that the bad call of free() is coming from LibreSSL from inside a call to curl_easy_cleanup(), but it is not necessarily a bug. In our example program, curl_global_init() is getting called by curl::init(), which is lazily called once by Easy::new(). This is violating the rules of this function according to the libcurl documentation (snipped the most relevant bits):

The basic rule for constructing a program that uses libcurl is this: Call curl_global_init, with a CURL_GLOBAL_ALL argument, immediately after the program starts, while it is still only one thread and before it uses libcurl at all. Call curl_global_cleanup immediately before the program exits, when the program is again only one thread and after its last use of libcurl.

[...]

It isn't actually required that the functions be called at the beginning and end of the program -- that's just usually the easiest way to do it. It is required that the functions be called when no other thread in the program is running.

These global constant functions are not thread safe, so you must not call them when any other thread in the program is running. It isn't good enough that no other thread is using libcurl at the time, because these functions internally call similar functions of other libraries, and those functions are similarly thread-unsafe. You can't generally know what these libraries are, or whether other threads are using them.

The thread-unsafety of these functions is more than just "you must protect it with a mutex", but an even stronger requirement that no other threads must be running, which for all intents and purposes means the main thread. So because the first time we use anything from the curl crate is inside a thread other than main, all bets are off and LibreSSL is "free" (bad pun) to have this bug, because we violated the rules of calling curl_global_init(), and in turn violated the rules of SSL_library_init() or whatever init routines LibreSSL has.

Now a simple fix to this program is to add curl::init() to the first line of our main function, but its certainly not obvious that this is required and does not encourage users towards the pit of success. In addition, the above program works just fine with the in-tree libcurl and a new OpenSSL, so users might not ever think of this problem if 99% of the time everything works just fine without explicitly calling init() from the main thread.

This is especially a concern for libraries that depend on curl, which also need to help the user call init() properly even if curl is not directly exposed. I don't have any perfect answers to this problem, but here are some ideas:

  • Make the requirements around init() more clear in the documentation, and encourage any downstream libraries that depend on curl to re-export or provide their own init() functions as well. This pretty much amounts to what is suggested in the libcurl docs.
  • Have init() unconditionally panic if called on any thread besides the main thread. This would make it obvious during development if this requirement is being violated, but I have no idea if we can check if the current thread is the main one or not.
  • Leverage platform-specific constructors (like the ctor crate) that call init() for the user so that on such platforms we do the right thing automatically.

Note that I am tempted to experiment with the third option downstream even if it is not considered for the curl crate proper.

Some prior discussion occurred in #157, but the objective was a bit different.

sagebind added a commit that referenced this issue May 24, 2020
As mentioned in #333, this is a potentially helpful addition that ensures that curl is initialized on the main thread by using constructor functions that get called by the OS before the current program's `main` is called.

This has the advantage that, assuming you are on one of the supported platforms, `init()` will be called safely, correctly, and automatically without concerning the user about the gotchas.

This does have some disadvantages:

- Constructor functions are always invoked, which means that simply including curl can slow down the startup of a user's program even if curl is only conditionally used, or used later in program execution.
- On platforms without a constructor implementation, the user still needs to initialize curl on the main thread. All the common platforms are covered though, so maybe this is a niche scenario.

There's no additional runtime cost to this implementation except on platforms without a constructor, where we do an extra atomic swap that isn't necessary. This is probably fixable with additional conditional compilation.
@alexcrichton
Copy link
Owner

Thanks for the detailed report here! I definitely didn't read the docs closely enough and wow is this a basically impossible requirement to place on consumers of the library. What in the world are we supposed to do in the face of thread unsafety like this?!

In any case I think that your idea in #334 is a pretty plausible way to solve this. I think it's fine to have almost all platforms covered by default there and we can still have the lazy-init if truly needed for other platforms.

One thing I'd like to do first though is to reproduce this. I ran this as cargo run from this repository on Linux and macOS but wasn't able to get a segfault or any error. Are you able to capture a full stack trace of all running threads when the crash happens? Or is there perhaps a more constrained environment you can provide to reproduce? (e.g. does the server matter?)

@sagebind
Copy link
Collaborator Author

wow is this a basically impossible requirement to place on consumers of the library.

To be fair, the author of curl has mentioned that this requirement were mostly put there by the TLS implementations and not curl itself. It also seems that this scenario is maybe lessening according to a recent blog post: https://daniel.haxx.se/blog/2020/03/01/imagining-a-thread-safe-curl_global_init/. Sadly we still have to support older curl versions, even if this requirement were lifted. For example, with this combo I'm definitely putting the blame on LibreSSL. 😞

One thing I'd like to do first though is to reproduce this. I ran this as cargo run from this repository on Linux and macOS but wasn't able to get a segfault or any error.

I was only able to reproduce on macOS when using the system libcurl version 7.54.0 and the system LibreSSL 2.6.5. Even then since the "bug" is probably a race condition in LibreSSL that happens when not init'd properly, I could only reproduce with this combination around 1/100 runs. Though @blinsay was able to reproduce on his machine much more frequently.

Are you able to capture a full stack trace of all running threads when the crash happens?

@blinsay has a backtrace of just the offending thread here: https://gist.github.com/blinsay/3d5174aaa03f5546cd492054889d8789. Unfortunately it seems like the system libs lack debug symbols or something as I can't get a meaningful backtrace inside libcurl or LibreSSL. I will attempt to get a backtrace of all threads today.

@alexcrichton
Copy link
Owner

Ah yeah I don't mean to really blame anyone per se, it's just that this isn't the only C library that's had these sorts of restrictions. OpenSSL was indeed quite a pain in the past but at least it's fixed now! In any case that looks like a promising blog post!

The backtrace there I think is just of the faulting thread and would need bt all to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...

@sagebind
Copy link
Collaborator Author

Perusing the source code for LibreSSL 2.6.5, I see a lot of suspicious thread-local hashing magic going on: https://github.com/libressl-portable/openbsd/tree/libressl-v2.6.5. These are the sort of things that OpenSSL eventually fixed (and LibreSSL 3 maybe?). If I had to guess, LibreSSL's thread hashing tables are not init'd correctly unless called from the main thread, causing weird bugs later.

@alexcrichton
Copy link
Owner

Hm so it's not necessarily a concurrency issue, it's a main thread issue? TBH I don't even know how you detect the "main thread" on Unix...

@sagebind
Copy link
Collaborator Author

Yeah basically. I know GUI FFI has this problem from time to time as well. Like the macOS native GUI libraries must be called on the main thread. IIRC no one could figure out how to safely enforce that. On iOS there's a system function that knows if the current thread is the main one I think.

@blinsay
Copy link

blinsay commented May 29, 2020

The backtrace there I think is just of the faulting thread and would need bt all to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...

Here's bt all with the same code as was in the previous gist.

https://gist.github.com/blinsay/3c6f1dd8dcf0f829faccee2fff0ccdec

Let me know if anything else would be useful.

@alexcrichton
Copy link
Owner

Bah unfortunately not as illuminating as I thought it would be. Thanks though!

alexcrichton pushed a commit that referenced this issue Jun 1, 2020
* Experimental use of constructor functions to call init

As mentioned in #333, this is a potentially helpful addition that ensures that curl is initialized on the main thread by using constructor functions that get called by the OS before the current program's `main` is called.

This has the advantage that, assuming you are on one of the supported platforms, `init()` will be called safely, correctly, and automatically without concerning the user about the gotchas.

This does have some disadvantages:

- Constructor functions are always invoked, which means that simply including curl can slow down the startup of a user's program even if curl is only conditionally used, or used later in program execution.
- On platforms without a constructor implementation, the user still needs to initialize curl on the main thread. All the common platforms are covered though, so maybe this is a niche scenario.

There's no additional runtime cost to this implementation except on platforms without a constructor, where we do an extra atomic swap that isn't necessary. This is probably fixable with additional conditional compilation.

* Micro-optimization

Ensure subsequent init calls are compiled down to a single atomic compare-and-swap.

* Consolidate ELF attribute

* Reducing to just a Once is acceptable

Since it is not possible for the `Once` to have contention in a constructor, it is safe to use it. (Under contention, `Once` requires `std::thread` machinery to work, which is not yet available before `main`.)

Also update documentation on initialization.
@sagebind
Copy link
Collaborator Author

sagebind commented Jun 1, 2020

I can't think of anything else we could possibly do to handle this better beyond #334, so we can probably close this out. Doing some brief research it doesn't seem feasible to detect whether init is called on a thread other than main (besides checking the thread name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants