Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexplained race condition in v0.16 causing "runtime dropped the dispatch task" #387

Closed
Nikita240 opened this issue Mar 18, 2024 · 8 comments · Fixed by #390
Closed

Unexplained race condition in v0.16 causing "runtime dropped the dispatch task" #387

Nikita240 opened this issue Mar 18, 2024 · 8 comments · Fixed by #390
Labels
bug Something isn't working

Comments

@Nikita240
Copy link

After upgrading from bollard v0.15 to v0.16 I started encountering a race condition in my unit tests. I believe this is likely related to the upgrade to hyper v1.1, but I can't quite pin down what's happening.

Here is the test setup to replicate:

//! [dependencies]
//! bollard = "0.16.0"
//! tokio = { version = "1.24.2", features = ["rt-multi-thread", "macros", "fs"] }
//! once_cell = "1.19.0"
use bollard::{image::ListImagesOptions, Docker};
use once_cell::sync::OnceCell;

static DOCKER: OnceCell<Docker> = OnceCell::new();
fn get_docker() -> Result<&'static Docker, bollard::errors::Error> {
    DOCKER.get_or_try_init(Docker::connect_with_socket_defaults)
}

#[tokio::test(flavor = "multi_thread")]
async fn test_runtime() {
    run_test(10).await;
}

#[tokio::test(flavor = "multi_thread")]
async fn test_runtime_2() {
    run_test(10).await;
}

#[tokio::test(flavor = "multi_thread")]
async fn test_runtime_3() {
    run_test(100).await;
}

async fn run_test(count: usize) {
    let docker = get_docker().unwrap();
    for _ in 0..count {
        let _ = &docker
            .list_images(Some(ListImagesOptions::<String> {
                all: true,
                ..Default::default()
            }))
            .await
            .unwrap();
    }
}

Here is what the error looks like:

running 3 tests
test test_runtime ... ok
test test_runtime_3 ... FAILED
test test_runtime_2 ... ok

failures:

---- test_runtime_3 stdout ----
thread 'test_runtime_3' panicked at tests/bollard.rs:33:14:
called `Result::unwrap()` on an `Err` value: HyperLegacyError { err: Error { kind: SendRequest, source: Some(hyper::Error(User(DispatchGone), "runtime dropped the dispatch task")) } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_runtime_3

The test failures are random and inconsistent.

rustc 1.76.0 (07dca489a 2024-02-04)

Do you have any ideas how to root-cause this?

@fussybeaver
Copy link
Owner

Thanks for the report... I see there's a v1.2.0 version of Hyper out and version v0.1.3 of hyper-util, let's see if we can reproduce this on those versions.

@fussybeaver
Copy link
Owner

I actually can't reproduce this problem. Can you give more detail on your system, and maybe any dockerd logs you find ? you can turn on debug logging in the daemon using the following configuration in /etc/docker/daemon.json :

{
	"debug": true,
	"raw-logs": true
}

@Nikita240
Copy link
Author

That's very strange. I'm able to replicate this on two different machines running different docker versions.

@Nikita240
Copy link
Author

I think the issue here is caused by the statically stored Docker instance static DOCKER: OnceCell<Docker>.

When running tokio tests with multi_thread, tokio will actually run the tests concurrently, but spawn a unique runtime for each one of them.

As of bollard@0.16, somehow, the Docker instance "absorbs" the first tokio runtime it sees, and if that runtime is dropped while someone else is making a request, you get the error "runtime dropped the dispatch task".

@fussybeaver
Copy link
Owner

Ah yes, I see it now if you run them all together..

@fussybeaver fussybeaver added the bug Something isn't working label Mar 19, 2024
@fussybeaver
Copy link
Owner

I put this test scenario into bollard's CI system, and it seems to fail on all connectors (http / ssl / named pipe / unix socket), so that excludes any issue with any individual connector. I also checked locally running against the latest master branch of hyper and it still fails (albeit less often).

@fussybeaver
Copy link
Owner

fussybeaver commented Mar 21, 2024

I did find a fix, if you have the time, I'd appreciate if you can check if it works for you.. #390

Related to this hyperium/hyper#2312

@Nikita240
Copy link
Author

I just got around to test your fix. I can confirm it works!

Thank you so much for your support on this ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants