Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1712 Latency Sees Unbounded Growth When Doing Any Amount Of Concurrent Writes #913

Closed
LuisOsta opened this issue Jul 19, 2023 · 27 comments
Assignees
Labels
tracked-in-jira Ticket filed in Mongo's Jira system

Comments

@LuisOsta
Copy link
Contributor

LuisOsta commented Jul 19, 2023

Versions/Environment

  1. What version of Rust are you using? 1.66.0
  2. What operating system are you using? MacOS
  3. What versions of the driver and its dependencies are you using? (Run
    cargo pkgid mongodb & cargo pkgid bson) https://github.com/rust-lang/crates.io-index#mongodb@2.6.0 and https://github.com/rust-lang/crates.io-index#bson@2.6.1
  4. What version of MongoDB are you using? (Check with the MongoDB shell using db.version()) 6.0.8
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? Replica Set

Describe the bug

When writing concurrently do a MongoDB collection, the full time until execution continues after the insert has been completed grows non-stop linearly. So if you do 10 concurrent writes, whichever one is the first to go will be done in ~60ms but the slowest will take about 480ms. This slowness growths linearly with a 100 concurrent writes the slowest one takes more than 1 second.

This is despite the tracing logs seemingly saying that the command continues to succeed in the appropriate amount of time:

2023-07-19T19:10:08.449191Z DEBUG mongodb::command: Command succeeded topologyId="64b8350f0d64165cf22c9b29" reply="{\"n\":1,\"electionId\":{\"$oid\":\"7fffffff000000000000001d\"},\"opTime\":{\"ts\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}},\"t\":29},\"ok\":1.0,\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"nLRyMv2XttX7Rkr34LTRkAiMgEg=\",\"subType\":\"00\"}},\"keyId\":7210123958148923394}},\"operationTime\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}}}" commandName="insert" requestId=34 driverConnectionId=4 serverConnectionId=1179 serverHost="ac-qwvoave-shard-00-01.prroxke.mongodb.net" serverPort=27017 durationMS=46

What I would expect is that the difference between running 1 write operation concurrently and running 2 to not be meaningfully different. While I'd expect latency to get worse with a significant amount of concurrent writes, I'd expect that to occur more gradually and under more significant load. And that the time it takes to do the write would be about the same as what the command succeeded tracing logs emits.

We've had this issue for a while but only recently took the time to isolate the issue. I've posted a rust program that replicates the issue with the relevant details below

Things I've tested out:

  • Upgrading the cluster to have more computing. Specifically migrating from a M10 to an M30 on Atlas. No change at all
  • This cluster is used for other things, maybe its some caching issue with other usage. I ran it in an isolated M30 Cluster on Atlas and the issue persists
  • I've tried to change the minimum and maximums pool sizes for the driver. Reducing the maximum further does slow it down but increasing the minimum or the maximum has minimal/no effect.
  • Various BSON-related things: Seems to have no effect at all

Based on the command log, it implies that the driver is getting things back at the time that I'd expect but that's possibly a red herring Apart from some lock-related issue I'm not sure what could be causing this. Do y'all have any idea of what the issue could be

To Reproduce
Use the following program to reproduce the issue and run it with the following command:

RUST_LOG='mongodb::command=debug' cargo run

main.rs

use serde;

#[derive(serde::Serialize, serde::Deserialize, Clone, Debug, PartialEq)]
struct EventDocument {
    #[serde(
        rename = "_id",
        with = "mongodb::bson::serde_helpers::uuid_1_as_binary"
    )] // rename for it to be picked up by the _id index
    pub id: uuid::Uuid, // UUID
    pub owner_id: String,
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    let db_name = "slow_test";
    let uri = "<mongo_uri>";
    let db = {
        let client_options = mongodb::options::ClientOptions::parse(uri).await.unwrap();
        let client = mongodb::Client::with_options(client_options).unwrap();
        client.database(db_name)
    };

    let collection = db.collection::<EventDocument>("events");
    collection.list_index_names().await.unwrap();

    let total_count = 2;
    let total_instant = tokio::time::Instant::now();
    let mut handles = vec![];
    for i in 0..total_count {
        let collection = collection.clone();
        let handle: tokio::task::JoinHandle<()> = tokio::spawn(async move {
            let instant = tokio::time::Instant::now();
            let event = EventDocument {
                id: uuid::Uuid::new_v4(),
                owner_id: format!("event-{}", i),
            };

            collection.insert_one(event, None).await.unwrap();
            println!("inserted {} in {:?}", i, instant.elapsed());
        });
        handles.push(handle);
    }
    futures::future::join_all(handles).await;
    println!("total time {:?}", total_instant.elapsed());
}

Cargo.toml:

[package]
name = "slow-mongo-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
mongodb = { version = "2.6.0", features = ["tracing-unstable", "bson-chrono-0_4", "bson-serde_with", "bson-uuid-1"]}
tokio = { version = "1.29.1", features = ["macros", "rt-multi-thread"] }
serde = { version = "1.0.136", features = ["derive"] }
uuid = "1.4.1"
futures = "0.3.28"
rand = "0.8.5"
tracing = "0.1.37"
tracing-subscriber = "0.3.17"
@LuisOsta
Copy link
Contributor Author

LuisOsta commented Jul 19, 2023

Some more interesting information, whenever I update the logic in the thread so that each thread establishes its own client, the individual write performance is much faster and we don't experience the unbounded growth in latency. Though of course the overall performance is slower since it creates a whole client for each thread. But if it was a simple issue of an internal database/collection lock then whether its one client or 100 it shouldn't matter. Do y'all have any idea of what this could mean?

Here's the new thread snippet where the client is created inside the thread

        let handle: tokio::task::JoinHandle<()> = tokio::spawn(async move {
            let db = {
                let client_options = mongodb::options::ClientOptions::parse(uri).await.unwrap();
                let client = mongodb::Client::with_options(client_options).unwrap();
                client.database(db_name)
            };
            let collection: mongodb::Collection<EventDocument> =
                db.collection::<EventDocument>("events");
            collection.list_index_names().await.unwrap();

            let instant = tokio::time::Instant::now();
            let event = EventDocument {
                id: uuid::Uuid::new_v4(),
                owner_id: format!("event-{}", i),
            };

            collection.insert_one(event, None).await.unwrap();
            println!("inserted {} in {:?}", i, instant.elapsed());
        });

It's important to note that adding a random sleep in the thread doesn't have the same effect of saving the individual write performance:

            let random_sleep = rand::random::<u64>() % 1000;
            tokio::time::sleep(tokio::time::Duration::from_millis(random_sleep)).await;
            collection.insert_one(event, None).await.unwrap();

The above code results in the same unbounded growth of latency

@LuisOsta
Copy link
Contributor Author

@abr-egn Any insights?

@abr-egn
Copy link
Contributor

abr-egn commented Jul 20, 2023

I haven't had time to look into this in any detail, sorry - hoping to get to it next week :) At first glance, I wonder if the use of join_all is causing misleading timing - have you tried using FuturesUnordered?

@LuisOsta
Copy link
Contributor Author

@abr-egn I can try that out! why would join_all affect the latencies measured within the threads? I can try out FuturesUnordered

@LuisOsta
Copy link
Contributor Author

LuisOsta commented Jul 21, 2023

I tested it with FuturesUnordered and got the same result:

use futures::StreamExt;
use serde;

#[derive(serde::Serialize, serde::Deserialize, Clone, Debug, PartialEq)]
struct EventDocument {
    #[serde(
        rename = "_id",
        with = "mongodb::bson::serde_helpers::uuid_1_as_binary"
    )] // rename for it to be picked up by the _id index
    pub id: uuid::Uuid, // UUID
    pub owner_id: String,
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    let db_name = "slow_test";
    let uri = "<uri>";
    let db = {
        let client_options = mongodb::options::ClientOptions::parse(uri).await.unwrap();
        let client = mongodb::Client::with_options(client_options).unwrap();
        client.database(db_name)
    };

    let collection = db.collection::<EventDocument>("events");
    collection.list_index_names().await.unwrap();

    let total_count = 100;
    let total_instant = tokio::time::Instant::now();
    // let mut handles = vec![];

    let futures = futures::stream::FuturesUnordered::new();

    for i in 0..total_count {
        let collection = collection.clone();
        let handle: tokio::task::JoinHandle<()> = tokio::spawn(async move {
            let instant = tokio::time::Instant::now();
            let event = EventDocument {
                id: uuid::Uuid::new_v4(),
                owner_id: format!("event-{}", i),
            };

            collection.insert_one(event, None).await.unwrap();
            println!("inserted {} in {:?}", i, instant.elapsed());
        });
        // handles.push(handle);
        futures.push(handle);
    }

    let results: Vec<_> = futures.collect().await;

    println!("total time {:?}", total_instant.elapsed());
}

With the following results:

inserted 1 in 51.434916ms
inserted 4 in 99.737416ms
inserted 5 in 148.287708ms
inserted 9 in 196.408875ms
inserted 13 in 245.087916ms
inserted 3 in 265.080875ms
inserted 0 in 265.972416ms
inserted 2 in 293.027708ms
inserted 27 in 311.660083ms
inserted 10 in 312.524125ms
inserted 14 in 340.332583ms
inserted 6 in 356.717ms
inserted 19 in 358.504083ms
inserted 28 in 387.030166ms
inserted 23 in 402.82025ms
inserted 33 in 403.434625ms
inserted 15 in 434.92ms
inserted 11 in 447.897083ms
inserted 7 in 447.914541ms
inserted 20 in 480.956416ms
inserted 22 in 488.094416ms
inserted 18 in 488.308708ms
inserted 16 in 492.882708ms
inserted 8 in 494.470208ms
inserted 12 in 530.187625ms
inserted 34 in 534.573291ms
inserted 30 in 534.735416ms
inserted 21 in 536.443916ms
inserted 17 in 538.307708ms
inserted 25 in 575.714458ms
inserted 35 in 580.833916ms
inserted 44 in 582.528ms
inserted 31 in 582.691ms
inserted 50 in 582.90925ms
inserted 56 in 621.6885ms
inserted 45 in 624.344625ms
inserted 61 in 629.336291ms
inserted 26 in 630.582833ms
inserted 65 in 630.771583ms
inserted 51 in 668.91375ms
inserted 32 in 670.547958ms
inserted 36 in 673.516375ms
inserted 69 in 677.680958ms
inserted 46 in 677.931083ms
inserted 29 in 702.98975ms
inserted 24 in 703.204583ms
inserted 62 in 717.018416ms
inserted 66 in 717.258916ms
inserted 73 in 718.588291ms
inserted 37 in 724.797875ms
inserted 39 in 725.467583ms
inserted 58 in 748.687416ms
inserted 70 in 748.948291ms
inserted 47 in 762.45125ms
inserted 53 in 762.748166ms
inserted 63 in 763.391958ms
inserted 74 in 770.768333ms
inserted 67 in 771.356583ms
inserted 38 in 793.60175ms
inserted 71 in 793.921958ms
inserted 48 in 808.835833ms
inserted 59 in 809.287958ms
inserted 54 in 810.342875ms
inserted 64 in 816.658166ms
inserted 68 in 818.27875ms
inserted 72 in 838.865916ms
inserted 75 in 838.863458ms
inserted 79 in 858.748125ms
inserted 60 in 858.981625ms
inserted 49 in 859.091708ms
inserted 40 in 862.583083ms
inserted 82 in 865.419333ms
inserted 84 in 883.711041ms
inserted 55 in 884.123833ms
inserted 80 in 904.809416ms
inserted 76 in 904.97975ms
inserted 86 in 905.654958ms
inserted 88 in 909.347166ms
inserted 90 in 911.403875ms
inserted 57 in 917.864375ms
inserted 52 in 919.6095ms
inserted 92 in 928.73625ms
inserted 85 in 928.942125ms
inserted 81 in 950.685875ms
inserted 87 in 950.894875ms
inserted 89 in 952.33825ms
inserted 91 in 957.081958ms
inserted 94 in 958.548333ms
inserted 93 in 961.317875ms
inserted 95 in 963.26075ms
inserted 96 in 974.654208ms
inserted 97 in 974.879166ms
inserted 98 in 995.672375ms
inserted 41 in 995.964291ms
inserted 99 in 998.408541ms
inserted 42 in 1.003324958s
inserted 43 in 1.004153541s
inserted 77 in 1.008009666s
inserted 78 in 1.008227708s
inserted 83 in 1.131228666s
total time 1.132587041s

@abr-egn
Copy link
Contributor

abr-egn commented Jul 21, 2023

Thank you for testing! I didn't have a concrete cause to suspect, but there have been timing surprises with join_all before and it was an easy variable to eliminate. I'll definitely be looking into this next week.

Also, I'd like to say that I really appreciate the thorough bug report with the minimized repro case!

@LuisOsta
Copy link
Contributor Author

I'm happy to have tested it out, there's always small things that could trip you. And always happy to provide as much info as possible! I know that the driver and the database are both quite complex so I know y'all need as much info as possible to figure out any issues.

@abr-egn abr-egn added the tracked-in-jira Ticket filed in Mongo's Jira system label Jul 24, 2023
@abr-egn
Copy link
Contributor

abr-egn commented Jul 24, 2023

JIRA ticket: https://jira.mongodb.org/browse/RUST-1712

@abr-egn
Copy link
Contributor

abr-egn commented Jul 24, 2023

FYI, I've been able to reproduce this on a local test instance, so it's nothing about your particular server config. Actively investigating for a root cause, I suspect accidental serialization in the driver's connection pool handling.

@LuisOsta
Copy link
Contributor Author

@abr-egn Thanks! Let me know if there's anything you need from me or anything I can to do help

@abr-egn
Copy link
Contributor

abr-egn commented Jul 26, 2023

Okay, this is a complicated situation :)

  • The driver specification requires that, in concurrent situations, connections are used for pending operations in a strictly FIFO order, which does introduce some inherent serial behavior. In the Rust driver, the overhead of the queue seems to be in the single-digit nanosecond range on my test setup, so not particularly significant.
  • Your test code is starting from a cold setup, where no connections are in the connection pool. This by itself wouldn't be an issue for parallel requests - the pool spawns a new task for each new connection and then moves on to the next queue item - however...
  • The number of pending new connections is capped at 2 in the Rust driver.

The behavior this adds up to is that all of the operations get added to the queue at once; the first two get new connections spun up, while the rest have to wait for the connection process to finish. Then the next two get spun up, and so on. This causes the linearly increasing latency that you've observed.

Mitigations for this:

  • Start with a warm connection pool. This is actually trickier said than done; the Rust driver is aggressive about re-using connections where possible, and the rate at which connections will be added to the pool in the background is low, so it's easy to be below min_pool_size even with a client in active, heavy use. I'm considering adding a method to the client to actively spin up connections to fill up to min_pool_size to help here.
  • Increase the number of allowed pending connections. Right now in Rust that's just a constant, but the specification allows it to be a user-set max_connecting option, and it should be straightforward to make that change.

With those two in place, you should be able to set min_pool_size, max_pool_size, and max_connecting appropriate to your expected concurrent load a and see minimal overhead from running concurrent operations. This will, of course, come with higher memory and network usage :) Does this seem like it'll address the issues you're seeing?

@LuisOsta
Copy link
Contributor Author

LuisOsta commented Jul 27, 2023

@abr-egn Thanks for getting back to me! I had also experienced this issue in non-cold-start environments which is why I was concerned. I'm glad there's a solution so we can deal with write spikes more smoothly. It's fine if it takes more network and memory resources

So before doing the writes I had the command list_index_names to warm up the connections, do those not have that effect? I can try to use min_pool_size and max_pool_sizeagain, though in my initial attempts they didn't seem to have much of a difference. When you did your testing, was changing those variables able to reduce the latency growth?

I can make a PR so that max_connecting is something that's user-configured, just to confirm so max_connecting sets the maximum number of operations that can be trying to establish a new connection?

@abr-egn
Copy link
Contributor

abr-egn commented Jul 27, 2023

So before doing the writes I had the command list_index_names to warm up the connections, do those not have that effect?

I tried something very similar - the problem is here that it's basically racing the network latency against the command time, and it's very possible for the command to finish quickly enough that it doesn't actually fully warm up the pool. This is why I'd like to add an explicit method for that :)

I can try to use min_pool_size and max_pool_size again, though in my initial attempts they didn't seem to have much of a difference. When you did your testing, was changing those variables able to reduce the latency growth?

min_pool_size definitely makes a big difference once the pool has been warmed up - in my tests with a warm pool larger than the concurrency needs, connection acquisition overhead was completely negligible and overall command latency was almost entirely determined by actual execution time.

I'm not surprised you weren't able to see clear results from tuning that parameter, though. If concurrency exceeds the actual number of connections available, it causes a bubble in the connection pipeline where every command in the queue has to wait for the connection spin-up, and you're back to linear latency increases. Seeing real improvements for your specific case needs all of a fully-warmed pool, a high min_pool_size, and increased max_connecting.

I can make a PR so that max_connecting is something that's user-configured, just to confirm so max_connecting sets the maximum number of operations that can be trying to establish a new connection?

It'd be excellent if you can put together that PR! And confirmed in your understanding - the limiting machinery is already in place, this would be replacing the constant limit it uses with a user-configured one.

@LuisOsta
Copy link
Contributor Author

LuisOsta commented Jul 27, 2023

@abr-egn I've put up #923, its a rough PR but hopefully it achieves what we need for max_connecting. Thank you for the clarity around the interplay between min_pool_size and max_connecting! I think that makes sense to me now.

Also this may be a bit in the weeds but I am curious. On the pool warming (to make sure that we actually have min_pool_size) you mentioned that you wanted to add a new method that actually assures we have warm connections that reach min_pool_size is that not what ensure_min_connections is meant to achieve? While working on the above PR I noticed that function in the worker.rs file that deals with the max_connecting

@LuisOsta
Copy link
Contributor Author

Also @abr-egn if your bandwidth is tight right now I can also try to implement the second part to actively spin up connections. Unsure of the complexity involved though lmk

@abr-egn
Copy link
Contributor

abr-egn commented Jul 28, 2023

I hope to have that put together today or Monday :) I suspect that for someone less familiar with the driver internals it would be a much slower project. Thank you, though!

@LuisOsta
Copy link
Contributor Author

LuisOsta commented Aug 2, 2023

@abr-egn Any updates on the pool warming work you mentioned?

@abr-egn
Copy link
Contributor

abr-egn commented Aug 3, 2023

I have a PR in progress; it's blocked behind #920 (that does some foundational work needed for this). I was OOO for the last couple of days, but I'm hoping to get that one merged this week.

@LuisOsta
Copy link
Contributor Author

LuisOsta commented Aug 7, 2023

Thanks! And I hoped you enjoyed your time off!

@abr-egn
Copy link
Contributor

abr-egn commented Aug 14, 2023

Sorry this has taken substantially longer than I first estimated!

I've got good news and bad news. The good news is that I have an imminent PR that adds the prewarming method, and it almost entirely eliminates the linear latency effect from the driver. An instrumented run (very similar to yours, with some added internal tracking) without the pre-warm:

[0]             get connection: 151.791µs
[0]             execute: 464.875µs
[0]     inserted in 666.083µs
[3]             get connection: 686.208µs
[3]             execute: 601µs
[3]     inserted in 1.34775ms
[4]             get connection: 1.345709ms
[4]             execute: 339.333µs
[4]     inserted in 1.728833ms
[9]             get connection: 1.749375ms
[9]             execute: 342.042µs
[9]     inserted in 2.126458ms
[7]             get connection: 2.14975ms
[7]             execute: 321.125µs
[7]     inserted in 2.507209ms
[10]            get connection: 2.526833ms
[10]            execute: 346.709µs
[10]    inserted in 2.936167ms
[8]             get connection: 2.953625ms
[8]             execute: 322.667µs
[8]     inserted in 3.310958ms
[11]            get connection: 3.346084ms
[11]            execute: 327.875µs
[11]    inserted in 3.728042ms
[13]            get connection: 3.716084ms
[13]            execute: 304.625µs
[13]    inserted in 4.057792ms
[15]            get connection: 4.107875ms
[2]             establish conn: 4.462125ms
[2]             get connection: 4.581375ms
[15]            execute: 487.958µs
[15]    inserted in 4.644291ms
[18]            get connection: 4.615292ms
[1]             establish conn: 4.994708ms
[1]             get connection: 5.202542ms
[18]            execute: 378µs
[18]    inserted in 5.082209ms
[12]            get connection: 5.127833ms
[2]             execute: 939.583µs
[2]     inserted in 5.562ms
[5]             get connection: 5.44425ms
[12]            execute: 374.917µs
[12]    inserted in 5.579917ms
[17]            get connection: 5.468ms
[5]             execute: 371.667µs
[5]     inserted in 5.903833ms
[6]             get connection: 5.772292ms
[1]             execute: 866.708µs
[17]            execute: 359.5µs
[1]     inserted in 6.11425ms
[17]    inserted in 5.877167ms
[14]            get connection: 5.79775ms
[6]             execute: 259.166µs
[6]     inserted in 6.059917ms
[14]            execute: 274.208µs
[14]    inserted in 6.256917ms
[19]            establish conn: 4.337916ms
[19]            get connection: 9.269375ms
[16]            establish conn: 5.118417ms
[16]            get connection: 9.64175ms
[19]            execute: 884.375µs
[19]    inserted in 10.239458ms
[16]            execute: 938µs
[16]    inserted in 10.610958ms

The first set of inserts (up through 13) are essentially re-using the same connection; these are interleaved with tasks being spawned for new connection spin-up, but some of the inserts finish fast enough that more can be executed on the same connection before any new connections finish establishing. After that, it's a mixture of new and re-used connections, with the overall result being the linear execution time.

Now, with the pre-warming added:

==> prewarm: 28.101416ms
[1]             get connection: 32.125µs
[5]             get connection: 29.583µs
[2]             get connection: 49.208µs
[7]             get connection: 39.875µs
[11]            get connection: 39.041µs
[12]            get connection: 44.5µs
[0]             get connection: 186.291µs
[3]             get connection: 139.25µs
[4]             get connection: 32.167µs
[8]             get connection: 120.375µs
[13]            get connection: 124.458µs
[9]             get connection: 179.167µs
[14]            get connection: 116.666µs
[16]            get connection: 109.959µs
[15]            get connection: 120.917µs
[18]            get connection: 104.708µs
[10]            get connection: 121.208µs
[17]            get connection: 153.25µs
[19]            get connection: 80.041µs
[6]             get connection: 181.083µs
[19]            execute: 412.584µs
[19]    inserted in 533.834µs
[5]             execute: 3.915042ms
[5]     inserted in 3.992583ms
[14]            execute: 5.927709ms
[14]    inserted in 6.096708ms
[7]             execute: 6.511542ms
[7]     inserted in 6.624959ms
[10]            execute: 6.395792ms
[10]    inserted in 6.546792ms
[8]             execute: 6.624458ms
[8]     inserted in 6.773417ms
[11]            execute: 7.006042ms
[11]    inserted in 7.09525ms
[0]             execute: 7.066792ms
[0]     inserted in 7.280917ms
[2]             execute: 7.293792ms
[2]     inserted in 7.373417ms
[1]             execute: 7.432875ms
[1]     inserted in 7.492417ms
[18]            execute: 7.387625ms
[18]    inserted in 7.523125ms
[9]             execute: 7.59225ms
[9]     inserted in 7.813708ms
[13]            execute: 7.826917ms
[13]    inserted in 7.975625ms
[4]             execute: 8.132167ms
[4]     inserted in 8.19475ms
[15]            execute: 8.156ms
[15]    inserted in 8.321042ms
[12]            execute: 8.428708ms
[12]    inserted in 8.511958ms
[3]             execute: 8.568583ms
[3]     inserted in 8.734208ms
[17]            execute: 8.495417ms
[17]    inserted in 8.682167ms
[16]            execute: 8.762916ms
[16]    inserted in 8.895041ms
[6]             execute: 8.810708ms
[6]     inserted in 9.017791ms
==> total time 37.436291ms

Connections being retrieved from the connection pool take effectively no time, and overall latency is determined entirely by time to execute (I've validated that the time there is spent waiting for a reply from the server, not doing anything in the driver). However:

  • Prewarming isn't cheap, relatively speaking - it costs more time than is saved overall.
  • Execution time now itself shows a linear progression.

On the latter point, I strongly suspect this is running into pessimal behavior for mongodb's locking behavior. It's likely you'd see improvements for the specific case of writing several documents into the same collection concurrently by buffering them locally and calling insert_many; beyond that, I'm afraid I don't have the knowledge of server behavior to give informed advice.

@LuisOsta
Copy link
Contributor Author

@abr-egn Thank you for your work on this! To address the points made in your comment:

  • Completely happy to take a bit of a hit in "overall" time with prewarming as what matters is keeping the latency overhead for these writes very small
  • The latency changes with the prewarming look quite promissing! Very excited to see that, and yeah I imagined that there was also going to be some amount of linear progression due to the write locks under heavy concurrent load.
  • If performance issues due to heavy concurrency writes becomes a problem in the future I'll investigate the buffering solution you mentioned or chat with someone at MongoDB who can help. But for now I don't think that'll be necessary
  • When you say "prewarming isn't cheap", just to clarify the "cost" comes during the "warming" period only right? Would that be a cost that's primarily borne at service startup? Or otherwise how is that logic implemented

@abr-egn
Copy link
Contributor

abr-egn commented Aug 15, 2023

When you say "prewarming isn't cheap", just to clarify the "cost" comes during the "warming" period only right? Would that be a cost that's primarily borne at service startup? Or otherwise how is that logic implemented

Yeah, just the time cost of running the warm-up loop. There's no ongoing overhead to having run the prewarm, it's implemented as a task that requests fresh connections in a loop until the pool hits min_pool_size (#932 has the gory details if you're curious).

If you want to make sure the connection pool always stays warmed up for a long-running service, you'll need to run this both at service startup and when topology changes cause pools to be cleared. The warmup task doesn't block normal operations so after a pool clear you'll see a temporary bump in increased latency as connections get re-established or operations wait for in-progress connections to finish establishing. Turning up max_connecting will shrink that window, or you could delay/deny other operations until the warmup task finishes if that makes sense for your particular application.

@LuisOsta LuisOsta changed the title Latency Sees Unbounded Growth When Doing Any Amount Of Concurrent Writes RUST-1712 Latency Sees Unbounded Growth When Doing Any Amount Of Concurrent Writes Aug 18, 2023
@LuisOsta
Copy link
Contributor Author

@abr-egn Do you have any updates on when I should expect the beta release (and the full version) to be cut? Thanks!

@abr-egn
Copy link
Contributor

abr-egn commented Aug 25, 2023

We'll be doing the beta release next week; timing of the full release after that depends how it goes, of course, but we generally give at least a few weeks for feedback to happen.

@LuisOsta
Copy link
Contributor Author

Thanks!

@abr-egn
Copy link
Contributor

abr-egn commented Aug 30, 2023

We just put out release 2.7.0-beta, which includes these changes.

@abr-egn abr-egn closed this as completed Aug 30, 2023
@LuisOsta
Copy link
Contributor Author

@abr-egn Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

No branches or pull requests

3 participants