Initialize windows with last up-to-WINDOW_SIZE blobs #524

rob-solana · 2018-07-02T17:09:13Z

the goal of this PR is to have full windows on all nodes at all times, which should reduce "failed RequestWindowIndex" in replication during test_multi_node_dynamic_network().

rob-solana · 2018-07-02T17:12:13Z

hoping to get feedback as I go

garious · 2018-07-02T17:31:18Z

Just a thought, how about passing the uninitialized window right into Bank::process_ledger? Let it fill the window as it goes. 99% of the entries would get pushed right back out, but maybe that's the no problem. I'd think that it'd allow us to reuse the existing windowing code.

rob-solana · 2018-07-02T17:40:03Z

I did re-use the existing windowing code (via copy-paste)... the only thing I didn't copy is the "whack the previous blobs" code

rob-solana · 2018-07-02T17:41:14Z

the only place we populate windows is in broadcast(), apparently

rob-solana · 2018-07-02T18:12:13Z

now it's actually compiling

garious · 2018-07-02T22:43:48Z

src/bin/fullnode.rs

    eprintln!("processed {} ledger...", entry_height);

+    let window_entries = {


How about a split_at() before process_ledger() so that you don't need to clone() it?

please ignore the .collect()... deemed infeasible in Discord#development discussion. therefore, split_at() non-option

garious · 2018-07-02T22:45:29Z

src/streamer.rs

+    blobs: VecDeque<SharedBlob>,
+    entry_height: u64,
+) -> Window {
+    let window = Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize]));


default_window()?

garious · 2018-07-02T22:47:41Z

src/server.rs

@@ -47,6 +50,7 @@ impl Server {
    pub fn new_leader<W: Write + Send + 'static>(
        bank: Bank,
        entry_height: u64,
+        window_entries: Option<Vec<Entry>>,


How about passing in a Window? We could always wrap these functions with things like new_default_leader to make it easier on tests (and the drone).

can do, but still need tail -WINDOW_SIZE from the entries iterator...

also, means passing a blob_recycler down to new_leader, because that's the one broadcaster uses...

garious · 2018-07-02T22:48:41Z

src/server.rs

-        let window = streamer::default_window();
+
+        let blob_recycler = BlobRecycler::default();
+        let window = match window_entries {


Copy-paste in one PR!? 👎

at this point, yes.

will collapse once I grok

whacked. only leader needs a populated window for this PR.

rob-solana · 2018-07-04T00:36:01Z

next up is initializing validators' windows from the ledger, should be a short trip from here.

comments on my implementation of "tail" in process_ledger are greatly appreciated

garious

I'm still concerned the system will work (even long-term) without this PR. If you disagree, can you update the PR description with a note that helps me understand?

garious · 2018-07-05T16:22:41Z

src/bank.rs

        let bank = Bank::default();
-        bank.process_ledger(ledger).unwrap();
+        let (ledger_height, tail) = bank.process_ledger(ledger).unwrap();


Can you add a second process_ledger test for when the ledger is longer than the tail? It should probably test that the last Entry.id in the window matches bank.last_id().

garious · 2018-07-05T16:25:08Z

src/fullnode.rs

+            Some(ledger_tail) => {
+                let mut blobs = VecDeque::new();
+                ledger_tail.to_blobs(&blob_recycler, &mut blobs);
+                streamer::initialized_window(&crdt, blobs, entry_height)


Need a test for how the leader behaves differently on this branch.

garious · 2018-07-05T16:31:38Z

src/streamer.rs

+        assert!(blobs.len() <= win.len());
+
+        // flatten deque to vec
+        let mut blobs: Vec<_> = blobs.into_iter().collect();


You can push this on the caller. Maybe the author of the caller will conclude its vector should be changed to Vec.

src/streamer.rs

@@ -458,6 +458,45 @@ pub fn default_window() -> Window {
    Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize]))
 }

+/// Initialize a rebroadcast window with most recent Entry blobs


garious · 2018-07-05T16:33:58Z

src/streamer.rs

+        );
+        // Index the blobs
+        let mut received = entry_height - blobs.len() as u64;
+        Crdt::index_blobs(crdt, &blobs, &mut received).unwrap();


What error will be panicked on here?

replicated data lock, or all kinds of blob operations... I can change to an expect()...

rob-solana · 2018-07-09T15:37:13Z

fixes issue #299

rob-solana · 2018-07-09T15:37:46Z

realizing needs a test (a validator that starts with an old ledger)

garious · 2018-07-09T18:50:50Z

@rob-solana, fyi, https://help.github.com/articles/closing-issues-using-keywords/. There's a special syntax to get GitHub to auto-close issues, and it needs to be in the PR description of in any of the PR's commit messages. Won't work from a PR comment.

garious

This is looking good. I see it as a restartable leader feature though, not a restartable validator feature.

garious · 2018-07-09T18:55:25Z

tests/multinode.rs

+fn restart_leader(
+    exit: Option<Arc<AtomicBool>>,
+    leader_fullnode: Option<FullNode>,
+    ledger_path: String,


You can cut down on a bunch of clone() calls by making this a &str.

then I have to to_str() for InFile::Path() and OutFile::Path()?

garious · 2018-07-09T18:59:35Z

tests/multinode.rs

+
+    let mut client = mk_client(&validator_data);
+    let getbal = retry_get_balance(&mut client, &bob_pubkey, Some(leader_balance));
+    assert!(getbal == Some(leader_balance));


assert_eq will offer a better error message

garious · 2018-07-09T19:05:56Z

tests/multinode.rs

+
+    // create a "stale" ledger by copying current ledger
+    let mut stale_ledger_path = ledger_path.clone();
+    stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_");


How about: https://doc.rust-lang.org/std/path/struct.Path.html#method.with_file_name

using with_file_name() or with_extension() sends me down a rabbit hole of conversion from a PathBuf back to a String (which doesn't always work). Lots more code for this really simple test case unless I stay in String land as long as possible.

garious · 2018-07-09T19:06:41Z

tests/multinode.rs

+    let mut stale_ledger_path = ledger_path.clone();
+    stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_");
+
+    std::fs::copy(ledger_path.clone(), stale_ledger_path.clone())


Neither clone() should be needed there.

333 | std::fs::copy(ledger_path, stale_ledger_path) | ----------- value moved here 334 | .expect(format!("copy {} to {}", &ledger_path, &stale_ledger_path,).as_str());

By changing from ledger_path.clone() to &ledger_path

garious · 2018-07-09T19:20:26Z

tests/multinode.rs

+}
+
+#[test]
+fn test_leader_restart_validator_start_from_old_ledger() {


Can you add a comment here describing the edge case? My understanding, "Test the case where both a leader and validator are starting up at roughly the same time, but the leader has a more recent copy of the ledger. This test ensures the leader makes its most recent entries available to the validator."

garious · 2018-07-09T19:26:44Z

cc #310

rob-solana · 2018-07-09T21:15:24Z

what's "cc" do?

garious · 2018-07-09T21:19:10Z

That's a convention @mvines started using here. It's just to notify subscribers of that issue of this PRs existence, much like you'd CC folks in email. If you go to that issue, you'll see the cross-link.

) Bumps [@solana/web3.js](https://github.com/solana-labs/solana-web3.js) from 0.76.0 to 0.77.0. - [Release notes](https://github.com/solana-labs/solana-web3.js/releases) - [Commits](solana-labs/solana-web3.js@v0.76.0...v0.77.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add scan_index for improving index generation * pr feedback * rework some stuff from pr feedback * get rid of redundant if * deal with rent correctly

rob-solana added the work in progress This isn't quite right yet label Jul 2, 2018

rob-solana closed this Jul 2, 2018

rob-solana reopened this Jul 2, 2018

garious reviewed Jul 2, 2018

View reviewed changes

mvines added noCI Suppress CI on this Pull Request and removed noCI Suppress CI on this Pull Request labels Jul 3, 2018

rob-solana removed the noCI Suppress CI on this Pull Request label Jul 3, 2018

rob-solana force-pushed the populate-initial-window branch 3 times, most recently from 8f71272 to 2acf401 Compare July 4, 2018 00:34

rob-solana requested a review from aeyakovenko July 4, 2018 00:34

rob-solana removed the work in progress This isn't quite right yet label Jul 5, 2018

garious reviewed Jul 5, 2018

View reviewed changes

rob-solana force-pushed the populate-initial-window branch 4 times, most recently from 24cdff1 to 884d4a4 Compare July 6, 2018 19:17

rob-solana added the work in progress This isn't quite right yet label Jul 6, 2018

rob-solana force-pushed the populate-initial-window branch from 884d4a4 to 209db08 Compare July 6, 2018 21:37

support an initial window filled with last up-to-WINDOW_SIZE blobs

2577467

rob-solana force-pushed the populate-initial-window branch from 209db08 to 2577467 Compare July 9, 2018 15:29

add test for populated window

21fe2a1

rob-solana removed the work in progress This isn't quite right yet label Jul 9, 2018

garious reviewed Jul 9, 2018

View reviewed changes

garious changed the title ~~support an initial window filled with last up-to-WINDOW_SIZE blobs~~ Initialize windows with last up-to-WINDOW_SIZE blobs Jul 9, 2018

garious approved these changes Jul 9, 2018

View reviewed changes

rob-solana force-pushed the populate-initial-window branch from 0b91367 to 9aa7530 Compare July 9, 2018 21:14

fixes issue solana-labs#299

1d597c7

rob-solana force-pushed the populate-initial-window branch from 9aa7530 to 1d597c7 Compare July 9, 2018 21:19

rob-solana merged commit 90a4ab7 into solana-labs:master Jul 9, 2018

rob-solana mentioned this pull request Jul 9, 2018

Make validators restartable #299

Closed

3 tasks

rob-solana deleted the populate-initial-window branch July 12, 2018 16:29

		eprintln!("processed {} ledger...", entry_height);

		let window_entries = {

Initialize windows with last up-to-WINDOW_SIZE blobs #524

Initialize windows with last up-to-WINDOW_SIZE blobs #524

Conversation

rob-solana commented Jul 2, 2018 • edited Loading

rob-solana commented Jul 2, 2018

garious commented Jul 2, 2018

rob-solana commented Jul 2, 2018

rob-solana commented Jul 2, 2018

rob-solana commented Jul 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rob-solana commented Jul 4, 2018

garious left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rob-solana commented Jul 9, 2018 • edited Loading

rob-solana commented Jul 9, 2018

garious commented Jul 9, 2018

garious left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rob-solana Jul 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garious commented Jul 9, 2018

rob-solana commented Jul 9, 2018

garious commented Jul 9, 2018

rob-solana commented Jul 2, 2018 •

edited

Loading

rob-solana commented Jul 9, 2018 •

edited

Loading

rob-solana Jul 9, 2018 •

edited

Loading