Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup bigtable block upload by factor of 8-10x #24534

Merged
merged 5 commits into from
May 17, 2022

Conversation

buffalu
Copy link
Contributor

@buffalu buffalu commented Apr 20, 2022

Problem

We were seeing our bigtable block upload running behind tip by several thousand blocks. We noticed it was uploading a block every ~500ms.

Summary of Changes

Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.

We're seeing around 50-60ms per block upload now, so we should be able to easily catch up and maintain bigtable state on the tip.

I think there's more to squeeze out, but curious what you guys think about this before I go down the next optimization rabbit hole.

@mergify mergify bot added the community Community contribution label Apr 20, 2022
@mergify mergify bot requested a review from a team April 20, 2022 19:07
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks pretty good; thanks for digging into this! A couple small questions, and I'd like to play around with it a bit...

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/Cargo.toml Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
@buffalu
Copy link
Contributor Author

buffalu commented Apr 20, 2022

this line is sus: https://github.com/solana-labs/solana/blob/master/ledger/src/bigtable_upload.rs#L173

using blocking receiver and passing it to async thingy? cc @mvines

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #24534 (5e2042f) into master (52db2e1) will decrease coverage by 0.0%.
The diff coverage is 0.0%.

❗ Current head 5e2042f differs from pull request most recent head ca236b9. Consider uploading reports for the commit ca236b9 to get more accurate results

@@            Coverage Diff            @@
##           master   #24534     +/-   ##
=========================================
- Coverage    82.1%    82.0%   -0.1%     
=========================================
  Files         612      610      -2     
  Lines      168534   168373    -161     
=========================================
- Hits       138396   138152    -244     
- Misses      30138    30221     +83     

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of upload speeds are you seeing with this changeset?
Why did you change your mind about the blockstore threads?

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
storage-bigtable/Cargo.toml Outdated Show resolved Hide resolved
@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

What kind of upload speeds are you seeing with this changeset?

i think i was seeing ~200mbit/s, but can't remember specifics. i can test it again by disabling bigtable uploads on our node, run for ~1-2 hours, stop it, then use the ledger-tool upload command. lmk if that's something you're interested in!

we're running on 25gbps server so we have plenty of bandwidth left over. if we could have a threadpool that's constantly popping items off a queue and running i think we could get that number way up (as opposed to running NUM_PARALLEL uploads and waiting for them all to complete before popping more off.

Why did you change your mind about the blockstore threads?

@t-nelson mentioned something about number of CPUs. I feel like that's probably better, but tbh don't have any super strong feelings about it. more is probably better tho, esp. if you're trying to catchup! during normal operation im seeing it keep up just fine and upload 2-3 blocks at a time with tip

@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

just wrote a script to check our bigtable highest slot against yours:

[2022-04-22T01:02:42Z INFO  jito_bigtable] destination bigtable 12496 slots ahead of source (source=130764747 dest=130777243)

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
storage-bigtable/src/lib.rs Outdated Show resolved Hide resolved
@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

cargo b --release --bin solana-ledger-tool && ./target/release/solana-ledger-tool --ledger /mt/ledger/validator-ledger bigtable upload

git commit: 00c5ec9

num_cpus: 48
uses 12 threads (num_cpus / 4) for blockstore + upload
~100ms per block in the upload

[2022-04-22T18:27:42.137475474Z INFO  solana_ledger::bigtable_upload] Upload took 1.5s for 12 blocks
[2022-04-22T18:27:43.210726324Z INFO  solana_ledger::bigtable_upload] Upload took 1.1s for 12 blocks
[2022-04-22T18:27:44.252376864Z INFO  solana_ledger::bigtable_upload] Upload took 1.0s for 9 blocks

Screen Shot 2022-04-22 at 1 54 58 PM

num_cpus: 48
uses 24 threads (num_cpus / 2) for blockstore + upload
~70ms per block in the upload

[2022-04-22T18:22:19.896359960Z INFO  solana_ledger::bigtable_upload] Upload took 1.6s for 22 blocks
[2022-04-22T18:22:21.364399516Z INFO  solana_ledger::bigtable_upload] Upload took 1.5s for 21 blocks
[2022-04-22T18:22:45.426086757Z INFO  solana_ledger::bigtable_upload] Upload took 1.6s for 23 blocks

Screen Shot 2022-04-22 at 1 57 01 PM

@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

git commit 62a40f1
(no tokio::spawn)
~200ms per block

[2022-04-22T18:51:03.401708893Z INFO  solana_ledger::bigtable_upload] Upload took 2.9s for 12 blocks
[2022-04-22T18:51:05.920438615Z INFO  solana_ledger::bigtable_upload] Upload took 2.5s for 12 blocks
[2022-04-22T18:51:08.139508376Z INFO  solana_ledger::bigtable_upload] Upload took 2.2s for 11 blocks
[2022-04-22T18:51:09.913070966Z INFO  solana_ledger::bigtable_upload] Upload took 1.8s for 10 blocks
[2022-04-22T18:51:11.336636226Z INFO  solana_ledger::bigtable_upload] Upload took 1.4s for 8 blocks

Screen Shot 2022-04-22 at 1 50 57 PM

@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

git commit 0d797e2 (master)

[2022-04-22T20:01:47.160736902Z INFO  solana_ledger::bigtable_upload] Upload took 6.8s for 2 blocks
[2022-04-22T20:02:09.115707196Z INFO  solana_ledger::bigtable_upload] Upload took 22.0s for 32 blocks
[2022-04-22T20:02:26.023151730Z INFO  solana_ledger::bigtable_upload] Upload took 16.9s for 32 blocks
[2022-04-22T20:03:06.173026343Z INFO  solana_ledger::bigtable_upload] Upload took 19.9s for 32 blocks

Screen Shot 2022-04-22 at 3 01 54 PM

@buffalu
Copy link
Contributor Author

buffalu commented Apr 22, 2022

in summary, seems like uploading blocks is network + cpu bound.

haven't profiled, but guessing attempting to compressing each item that gets uploaded to bigtable 3 different ways is expensive 😆

@buffalu buffalu force-pushed the lb/bigtable_speedup branch from 72df590 to 92ce23b Compare April 26, 2022 18:32
@buffalu
Copy link
Contributor Author

buffalu commented Apr 26, 2022

@CriesofCarrots are you running this on your warehouse nodes now? looks like they're caught up :)

@CriesofCarrots
Copy link
Contributor

are you running this on your warehouse nodes now? looks like they're caught up :)

No, I'm not sure why we're caught up suddenly 🤷‍♀️ (although Joe might know)
Btw, I'm probably going to sneak in another change in this area that's going to force some rebasing here. Apologies in advance!

@buffalu
Copy link
Contributor Author

buffalu commented Apr 26, 2022

are you running this on your warehouse nodes now? looks like they're caught up :)

No, I'm not sure why we're caught up suddenly 🤷‍♀️ (although Joe might know) Btw, I'm probably going to sneak in another change in this area that's going to force some rebasing here. Apologies in advance!

all good! feel free to ping me here or discord if you need anything! just rebased!

@CriesofCarrots
Copy link
Contributor

Hey @buffalu , sorry for the delay here. I'm hoping to wrap up my changes in this area today/tomorrow. I played around with some of your changes on top, and it seems like the new spawns do make a huge difference. But reading the blocks from Blockstore was never a limiting factor on our warehouse nodes, so I think I'd like to first try a smaller changeset here without the multiple Blockstore threads. What do you think?
No need to take any action here yet; I'll ping you when my stuff is merged. Just wanted to start chatting about slimming this a little.

@buffalu
Copy link
Contributor Author

buffalu commented May 3, 2022

Hey @buffalu , sorry for the delay here. I'm hoping to wrap up my changes in this area today/tomorrow. I played around with some of your changes on top, and it seems like the new spawns do make a huge difference. But reading the blocks from Blockstore was never a limiting factor on our warehouse nodes, so I think I'd like to first try a smaller changeset here without the multiple Blockstore threads. What do you think? No need to take any action here yet; I'll ping you when my stuff is merged. Just wanted to start chatting about slimming this a little.

from what i remember i do think i was seeing blockstore take ~100ms per block so it might be worth checking to make sure it's not going to be a limiting factor if someone wants to crank it up more (ie using ledger-tool/custom number of cpus in the config).

@CriesofCarrots
Copy link
Contributor

@buffalu , I'm finally done mucking around in the bigtable_upload files. Are you still willing to rebase this?
I see your point about the Blockstore threads. It's never the bottleneck on our nodes, but just talked to a partner blocked by it.

@buffalu
Copy link
Contributor Author

buffalu commented May 13, 2022

@buffalu , I'm finally done mucking around in the bigtable_upload files. Are you still willing to rebase this? I see your point about the Blockstore threads. It's never the bottleneck on our nodes, but just talked to a partner blocked by it.

yeah! gimme a few hours, will get to later after meetings 😢

@buffalu
Copy link
Contributor Author

buffalu commented May 13, 2022

@CriesofCarrots can you elaborate what you mean by "It's never the bottleneck on our nodes, but just talked to a partner blocked by it."

are they running with the new tokio::spawn but single blockstore thread? or just in general bigtable upload being slow

@CriesofCarrots
Copy link
Contributor

are they running with the new tokio::spawn but single blockstore thread? or just in general bigtable upload being slow

They are running with vanilla v1.9 (don't recall which patch), and it was clear from the timings that the upload thread was sitting idle waiting for blockstore reads

@buffalu buffalu force-pushed the lb/bigtable_speedup branch from 92ce23b to eb95861 Compare May 13, 2022 16:10
@buffalu
Copy link
Contributor Author

buffalu commented May 13, 2022

ok should be good. caught a potential unwrap error + fixed that. lmk if anything else sticks out

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this, but looks like we lost your changes to use num_cpus as basis for num_blocks_to_upload_in_parallel. Can you restore that? 🙏

storage-bigtable/Cargo.toml Outdated Show resolved Hide resolved
storage-bigtable/Cargo.toml Outdated Show resolved Hide resolved
@buffalu
Copy link
Contributor Author

buffalu commented May 13, 2022

I started reviewing this, but looks like we lost your changes to use num_cpus as basis for num_blocks_to_upload_in_parallel. Can you restore that? 🙏

yeah, assumed we wanted to use your config already. do you think it makes sense for num blockstore threads = num sending threads = num_blocks_to_upload_in_parallel?

@CriesofCarrots
Copy link
Contributor

yeah, assumed we wanted to use your config already.

Oh gotcha. Sorry if that was confusing. I just set up relative values.
I still think it makes sense to base the defaults on num_cpus; probably half num_cpus for num_blocks_to_upload_in_parallel...? Was that what you had before?

do you think it makes sense for num blockstore threads = num sending threads = num_blocks_to_upload_in_parallel?

Yes, that makes sense to me

@buffalu buffalu force-pushed the lb/bigtable_speedup branch from f1bbebe to a29f94e Compare May 13, 2022 18:44
@buffalu
Copy link
Contributor Author

buffalu commented May 13, 2022

this prob doesn't make sense for this PR, but we also noticed using separate LedgerStorage instances can massively speed things up too. we can get ~500-1k blocks per second reading with this change across 64 threads and 500 slot request sizes.

cloning causes them to use the same channel and TCP connection, if you create a new instance for each thread they'd each have their own connection + channel

@CriesofCarrots
Copy link
Contributor

we also noticed using separate LedgerStorage instances can massively speed things up too

Yeah, let's look at that separately

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
}

let results = futures::future::join_all(tasks).await;
let results: Vec<_> = results.into_iter().map(|r| r.unwrap()).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's map the tokio Error to something, just to be safe. Can you add an Error variant?
Also, can we rewrite this whole block to only iterate once?
Something like:

let mut bytes_written = 0;
let mut maybe_first_err: Option<Error> = None;
for result in results {
    match result {
        Err(err) => if maybe_first_err.is_none() {
            maybe_first_err = Some(Error::TokioError(err));
        }
        Ok(Err(err)) => if maybe_first_err.is_none() {
            maybe_first_err = Some(Error::BigTableError(err));
        }
        Ok(Ok(bytes)) => {
            bytes_written += bytes;
        }
    }
}
if let Some(err) = maybe_first_err {
    return Err(err);
}

@CriesofCarrots
Copy link
Contributor

One additional request: when you pin tokio, can you also please rollback all the Cargo.lock version bumps? Would prefer to consider those separately.
I think the only Cargo.lock changes needed should be adding tokio and futures to the solana-bigtable crate.

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this running on two testnet warehouse nodes, and it's working great! I noticed a couple logging things (see comments).
Otherwise, just the Cargo.lock reverts, and I think that's it from me!

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
buffalu added 2 commits May 15, 2022 14:01
Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.
@buffalu buffalu force-pushed the lb/bigtable_speedup branch from 5e2042f to 6a299d2 Compare May 15, 2022 19:12
t-nelson
t-nelson previously approved these changes May 16, 2022
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just some nits

ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
ledger/src/bigtable_upload.rs Outdated Show resolved Hide resolved
storage-bigtable/src/lib.rs Outdated Show resolved Hide resolved
@mergify mergify bot dismissed t-nelson’s stale review May 16, 2022 21:46

Pull request has been modified.

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the iterations on this @buffalu !

@CriesofCarrots
Copy link
Contributor

Merging on red; downstream-anchor-projects has been removed on master (saving CI the load of rebasing this)

@CriesofCarrots CriesofCarrots merged commit 6bcadc7 into solana-labs:master May 17, 2022
mergify bot pushed a commit that referenced this pull request May 17, 2022
Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.

(cherry picked from commit 6bcadc7)

# Conflicts:
#	Cargo.lock
#	programs/bpf/Cargo.lock
#	storage-bigtable/Cargo.toml
mergify bot added a commit that referenced this pull request May 17, 2022
…25278)

* Speedup bigtable block upload by factor of 8-10x (#24534)

Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.

(cherry picked from commit 6bcadc7)

# Conflicts:
#	Cargo.lock
#	programs/bpf/Cargo.lock
#	storage-bigtable/Cargo.toml

* Fix conflicts

Co-authored-by: buffalu <85544055+buffalu@users.noreply.github.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Jun 29, 2022
Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Jun 29, 2022
Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants