Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple forks to the bank #2689

Closed

Conversation

rob-solana
Copy link
Contributor

@rob-solana rob-solana commented Feb 7, 2019

Problem

Validators may receive different, valid blobs targeting the same slot height. Those blobs represent forks in the ledger and are stored as forks in the data structure Blocktree. Likewise, the slot leader may want to build entries on its own fork so that it can roll back if later it decides there's a better fork to vote on. In either case, the current Bank does not support forks in the ledger (aka, interpreting Blocktree). It doesn't offer a way to checkpoint its state or refer back to previous checkpoints to implement its methods.

Summary of Changes

  • Reduce Bank to shell that holds only Forks and pubsub
  • Update this list to reflect the current state of this PR:
  • bank split into BankCheckpoint and BankState that composes the checkpoints
  • live and root BankState is reachable from the bank
  • made the RpcSubscriptions explicit. Trait objects are really not great in rust. They require a dynamic allocation, the compiler implements them as dynamic dispatch, and they cannot really compose the traits up through the Box, so you end up passing a bunch of trait level descriptions through every api that uses them.
  • Checkpoints module that implements a DAG to keep track of the checkpoints
  • Forks that merges and purges the live chain as it moves along

What is not done (at the time of this writing):

  • refactoring the system to not use a global bank. I think in a lot places, the expected BankState instance should be passed in as a function parameter. Instead of using a bank instance to reach for the live or root fork. The problem that there are some things that should be computed from the root, and others from the live fork. last_id for a client should probably come from root, but for a vote needs to come from the live branch that is being voted on.
  • Replace process_ledger() with another startup function that populates the bank with the correct forks
  • Fix failing multinode tests. See the TODOs from Rob in that code.

Fixes #2555
Fixes #2210

@rob-solana
Copy link
Contributor Author

yoink of #2640

@rob-solana rob-solana force-pushed the checkpoints_4_squashed branch 5 times, most recently from a757bd2 to e822d6c Compare February 7, 2019 19:47
src/forks.rs Show resolved Hide resolved
src/forks.rs Outdated Show resolved Hide resolved
src/checkpoints.rs Outdated Show resolved Hide resolved
src/bank.rs Show resolved Hide resolved
src/bank.rs Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 7, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@0f8ff07). Click here to learn what that means.
The diff coverage is 88.4%.

@@           Coverage Diff            @@
##             master   #2689   +/-   ##
========================================
  Coverage          ?   74.3%           
========================================
  Files             ?     117           
  Lines             ?   18791           
  Branches          ?       0           
========================================
  Hits              ?   13980           
  Misses            ?    4811           
  Partials          ?       0

@rob-solana rob-solana added work in progress This isn't quite right yet noCI Suppress CI on this Pull Request labels Feb 7, 2019
@garious
Copy link
Contributor

garious commented Feb 8, 2019

What's the point of Bank after this PR? Why not delete the existing Bank and call the new thing Bank?

And all these unit-tests that have been updated to bank.active_fork(), how about passing them the thing that's currently called BankFork?

@rob-solana
Copy link
Contributor Author

What's the point of Bank after this PR? Why not delete the existing Bank and call the new thing Bank?

And all these unit-tests that have been updated to bank.active_fork(), how about passing them the thing that's currently called BankFork?

I will explore that approach, thanks.

@rob-solana rob-solana force-pushed the checkpoints_4_squashed branch 3 times, most recently from 81e3ac2 to c825e7f Compare February 12, 2019 20:31
Copy link
Contributor

@garious garious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked this approved while it was still WIPed. Not sure how to remove that, so marking it as Request Changes.

@rob-solana rob-solana force-pushed the checkpoints_4_squashed branch from 11883e2 to 709cc0a Compare February 13, 2019 18:54
fmt

rebase forkup

fixing build all

leave the last block open

more tests

rebase fix

replay test fix

rollback depth parameter

cleanup

better replay stage

s/finalized/frozen for bank_checkpoints that are no longer able to write transactions or ticks

remove println

disconnect blob sender so that TVU isn't replyaing onto TPU's fork

finalize()->freeze(), code review comments

rebased

fix window_send_test

remove entries_to_blocks()

guard against too many ticks in a slot

limiting ticks, take #2

more info

fix test_replay_stage_poh_error_entry_receiver

fix leader_scheduler::test_update_height

s/bank_state/bank_fork/g

s/bank_checkpoint/bank_delta/g

fix up stragglers from rename

get it building again.  num_ticks_left_in_slot() should not assume slot from tick_height

fixups

ignore leader rotation tests until ledger is turned around

renames, compile fixes

simplify PohService, initialize and freeze TPU bank, skip TPU slot in TVU

compiling again

replay_stage cleanup

move banking_stage done back to banking_stage

use closure in test, too

cleanup poh_service, freeze() TPU's slot from banking_stage()
@rob-solana rob-solana force-pushed the checkpoints_4_squashed branch from 5ab02d8 to 775b3b4 Compare February 14, 2019 00:17
@rob-solana
Copy link
Contributor Author

the bank after this PR holds Forks and pubsub

still to do: replace process_ledger() with another startup function that populates the bank with the correct forks

@garious
Copy link
Contributor

garious commented Feb 14, 2019

Time to remove noCI?

@sagar-solana
Copy link
Contributor

@garious not yet, still some failing multinode tests that need the "to do" implemented from Rob's comment.

@garious garious removed the noCI Suppress CI on this Pull Request label Feb 14, 2019
@garious garious removed the request for review from mvines February 14, 2019 16:43
@garious garious changed the title Refactor bank and track root and live bank checkpoints. Add support for multiple forks to the bank Feb 14, 2019
@sakridge
Copy link
Member

Should we just remove process_ledger, can replay_stage just handle starting from 0 and run through the bank that way?

@carllin
Copy link
Contributor

carllin commented Feb 14, 2019

@sakridge, I believe @rob-solana proposed something similar, one of the chief issues being ignoring/not sending leader rotation signals from replay stage while processing the ledger.

@sakridge
Copy link
Member

@carllin Yea, there's the voting, sending leader rotation, also the whole bit about ignoring the ledger when you are the leader. Would have to figure out how to handle that stuff.

@garious
Copy link
Contributor

garious commented Feb 16, 2019

Passed back to #2782

@garious garious closed this Feb 16, 2019
@rob-solana rob-solana deleted the checkpoints_4_squashed branch February 28, 2019 02:12
jeffwashington added a commit to jeffwashington/solana that referenced this pull request Aug 29, 2024
* make shrink use in-mem idx info

* fix cli arg typo

* remove has_written_abnormal_entry_to_disk_flag.

* pr reviews: local counter

* remove the field that hold index entry in memory for shrinking

* clippy

* default test/bench config to use onlyabnormalwithverify scan option

* remove comments

* share shrink account populate code

---------

Co-authored-by: HaoranYi <haoran.yi@anza.xyz>
jeffwashington added a commit to jeffwashington/solana that referenced this pull request Aug 29, 2024
* make shrink use in-mem idx info

* fix cli arg typo

* remove has_written_abnormal_entry_to_disk_flag.

* pr reviews: local counter

* remove the field that hold index entry in memory for shrinking

* clippy

* default test/bench config to use onlyabnormalwithverify scan option

* remove comments

* share shrink account populate code

---------

Co-authored-by: HaoranYi <haoran.yi@anza.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
work in progress This isn't quite right yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants