-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize windows with last up-to-WINDOW_SIZE blobs #524
Initialize windows with last up-to-WINDOW_SIZE blobs #524
Conversation
hoping to get feedback as I go |
Just a thought, how about passing the uninitialized window right into |
I did re-use the existing windowing code (via copy-paste)... the only thing I didn't copy is the "whack the previous blobs" code |
the only place we populate windows is in broadcast(), apparently |
now it's actually compiling |
src/bin/fullnode.rs
Outdated
eprintln!("processed {} ledger...", entry_height); | ||
|
||
let window_entries = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a split_at()
before process_ledger()
so that you don't need to clone()
it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please ignore the .collect()... deemed infeasible in Discord#development discussion. therefore, split_at() non-option
src/streamer.rs
Outdated
blobs: VecDeque<SharedBlob>, | ||
entry_height: u64, | ||
) -> Window { | ||
let window = Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_window()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
src/server.rs
Outdated
@@ -47,6 +50,7 @@ impl Server { | |||
pub fn new_leader<W: Write + Send + 'static>( | |||
bank: Bank, | |||
entry_height: u64, | |||
window_entries: Option<Vec<Entry>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about passing in a Window
? We could always wrap these functions with things like new_default_leader
to make it easier on tests (and the drone).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do, but still need tail -WINDOW_SIZE from the entries iterator...
also, means passing a blob_recycler down to new_leader, because that's the one broadcaster uses...
src/server.rs
Outdated
let window = streamer::default_window(); | ||
|
||
let blob_recycler = BlobRecycler::default(); | ||
let window = match window_entries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste in one PR!? 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will collapse once I grok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whacked. only leader needs a populated window for this PR.
8f71272
to
2acf401
Compare
next up is initializing validators' windows from the ledger, should be a short trip from here. comments on my implementation of "tail" in process_ledger are greatly appreciated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned the system will work (even long-term) without this PR. If you disagree, can you update the PR description with a note that helps me understand?
src/bank.rs
Outdated
let bank = Bank::default(); | ||
bank.process_ledger(ledger).unwrap(); | ||
let (ledger_height, tail) = bank.process_ledger(ledger).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a second process_ledger
test for when the ledger is longer than the tail? It should probably test that the last Entry.id
in the window matches bank.last_id()
.
src/fullnode.rs
Outdated
Some(ledger_tail) => { | ||
let mut blobs = VecDeque::new(); | ||
ledger_tail.to_blobs(&blob_recycler, &mut blobs); | ||
streamer::initialized_window(&crdt, blobs, entry_height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a test for how the leader behaves differently on this branch.
src/streamer.rs
Outdated
assert!(blobs.len() <= win.len()); | ||
|
||
// flatten deque to vec | ||
let mut blobs: Vec<_> = blobs.into_iter().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can push this on the caller. Maybe the author of the caller will conclude its vector should be changed to Vec
.
@@ -458,6 +458,45 @@ pub fn default_window() -> Window { | |||
Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize])) | |||
} | |||
|
|||
/// Initialize a rebroadcast window with most recent Entry blobs |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
src/streamer.rs
Outdated
); | ||
// Index the blobs | ||
let mut received = entry_height - blobs.len() as u64; | ||
Crdt::index_blobs(crdt, &blobs, &mut received).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error will be panicked on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replicated data lock, or all kinds of blob operations... I can change to an expect()...
24cdff1
to
884d4a4
Compare
884d4a4
to
209db08
Compare
209db08
to
2577467
Compare
fixes issue #299 |
realizing needs a test (a validator that starts with an old ledger) |
@rob-solana, fyi, https://help.github.com/articles/closing-issues-using-keywords/. There's a special syntax to get GitHub to auto-close issues, and it needs to be in the PR description of in any of the PR's commit messages. Won't work from a PR comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I see it as a restartable leader feature though, not a restartable validator feature.
fn restart_leader( | ||
exit: Option<Arc<AtomicBool>>, | ||
leader_fullnode: Option<FullNode>, | ||
ledger_path: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can cut down on a bunch of clone()
calls by making this a &str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I have to to_str() for InFile::Path() and OutFile::Path()?
tests/multinode.rs
Outdated
|
||
let mut client = mk_client(&validator_data); | ||
let getbal = retry_get_balance(&mut client, &bob_pubkey, Some(leader_balance)); | ||
assert!(getbal == Some(leader_balance)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq
will offer a better error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
|
||
// create a "stale" ledger by copying current ledger | ||
let mut stale_ledger_path = ledger_path.clone(); | ||
stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using with_file_name() or with_extension() sends me down a rabbit hole of conversion from a PathBuf back to a String (which doesn't always work). Lots more code for this really simple test case unless I stay in String land as long as possible.
tests/multinode.rs
Outdated
let mut stale_ledger_path = ledger_path.clone(); | ||
stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_"); | ||
|
||
std::fs::copy(ledger_path.clone(), stale_ledger_path.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither clone()
should be needed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
333 | std::fs::copy(ledger_path, stale_ledger_path)
| ----------- value moved here
334 | .expect(format!("copy {} to {}", &ledger_path, &stale_ledger_path,).as_str());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By changing from ledger_path.clone()
to &ledger_path
} | ||
|
||
#[test] | ||
fn test_leader_restart_validator_start_from_old_ledger() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here describing the edge case? My understanding, "Test the case where both a leader and validator are starting up at roughly the same time, but the leader has a more recent copy of the ledger. This test ensures the leader makes its most recent entries available to the validator."
cc #310 |
0b91367
to
9aa7530
Compare
what's "cc" do? |
9aa7530
to
1d597c7
Compare
That's a convention @mvines started using here. It's just to notify subscribers of that issue of this PRs existence, much like you'd CC folks in email. If you go to that issue, you'll see the cross-link. |
) Bumps [@solana/web3.js](https://github.com/solana-labs/solana-web3.js) from 0.76.0 to 0.77.0. - [Release notes](https://github.com/solana-labs/solana-web3.js/releases) - [Commits](solana-labs/solana-web3.js@v0.76.0...v0.77.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add scan_index for improving index generation * pr feedback * rework some stuff from pr feedback * get rid of redundant if * deal with rent correctly
the goal of this PR is to have full windows on all nodes at all times, which should reduce "failed RequestWindowIndex" in replication during test_multi_node_dynamic_network().