Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunks #1013

Merged
merged 89 commits into from
Sep 4, 2019
Merged

Chunks #1013

merged 89 commits into from
Sep 4, 2019

Conversation

SkidanovAlex
Copy link
Collaborator

No description provided.

- Separating outgoing and incoming receipts, fixing the logic to collect outgoing receipts, properly saving incoming receipts if a chunk was skpiped.
- Previously if block was first orphaned, and then was missing chunks, chunks were not requested. Since orphans are processed in `chain`, but requests can only be done from `client`, this required some refactoring.
…erly work on validator rotation for chunks

- Changing validator rotation such that the current block is sufficient to know the epoch of the next block
- Fixing more issues with handling receipts
- Orphaning chunk one parts if the epoch is not known
- Re-requesting chunk one parts if they are missing when block is received
- Fixing some issues with using incorrect height/prev block when computing number of total parts
- For now making header sync to commit after every block, doing otherwise breaks the current interaction with the validator manager
error!("MOO processing orphaned one parts after receiving {:?}", block_hash);
for shard_id in 0..self.runtime_adapter.num_shards() {
let shard_id = shard_id as ShardId;
for part_id in 0..self.runtime_adapter.num_total_parts(block_hash) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid calling num_total_parts multiple times.

SkidanovAlex and others added 4 commits July 29, 2019 12:37
1. Initiating state download (aka Catchup) on each first block of each epoch for the *next* epoch;
2. Reusing the state sync machinery we already have (and fixing various bugs in it) to actually sync state for chunks;
3. Immediately applying state transition for the next epoch if the state is downloaded, putting block into a queue if not;
4. Processing the queue once the state is downloaded;
5. Orhpaning the block if the state still not downloaded when the next epoch starts, unorphaning on (4);

Addresses #1046
In particular, merging all the work on validator rotation that was happening in master (fixing stake returns, kicking validators out etc) with the work on validator rotation for the shard chains.
Some aspects of the merged code still appear a bit flaky, more testing and fixes are coming
@ilblackdragon ilblackdragon mentioned this pull request Jul 30, 2019
6 tasks
…onal issue that verify_chunk_sginatures calling and that it may result in error are fixed.
…tor_proposals. Removed post state root from ShardMsg
* Adding gas usage and limit to the header to track relevant information for rewards.

* Remove usage of ..Default::default in proto conversions, which is hiding issues

* Account deletion (#1107)

* Account deletion implementation

* Stakers can't be deleted, but must have more rent on their account (4 * epoch_length). Also added check_rent method to call after transaction execution to make sure there is still enough rent on the account

* Actually delete data when deleting an account and test that

* Add support for default values in runtime config (expected that we will put reasonable defaults in the binary), and fixed test_deserialize test / testnet.json

* Moved helper functions for transactions into test-utils User trait

* Reuse check_stake function from runtime and system

* Address comments

* Step 1. Add validator proposals are called from process_block. Additional issue that verify_chunk_sginatures calling and that it may result in error are fixed.

* Merge master into staging (#1116)

* bump version to 0.2.6 (#1091)

* Fix stake double return bug (#1084)

* fix stake double return bug and update tests

* use cache properly

* properly implement stake returning

* refactor test utils

* fix gensis config

* Fix block producer assignment (#1088)

* fix block producer assignment

* trying with test lock

* tweak test parameters

* bump version to 0.2.6

* Update docker to have git version and use cache (#1092)

* neartest.com has been retired (#1098)

* Change how docker port mapping works for macos (#1086)

* Apply Peter's fix to docker image start up code

* fix port mapping in nodelib

* fix #1042: Ban peer doesn't save to storage (#1093)

* Add registers to Wasm bindings

* Remove some unreachables (#1108)

* Revert back to cranelift backend for now (#1109)

Singlepass backend is crashing

* Fix stake test (#1095)

* Fix stake test

* Check expected validators.

* Specify expected validator order.

* No need to move anymore.

* Step 2. Move post_state_root into ChunkExtra with gas_used and validator_proposals. Removed post state root from ShardMsg

* Step 3. Gas limit/usage & validator proposals propagate

* Cleaned up tests in validator manager. First steps to make sure routing tables are populated.

* Fix chain tests

* Change 100ms to 250ms block production for chunk_manager tests
@ilblackdragon ilblackdragon changed the base branch from master to staging July 31, 2019 21:49
/// Ban given peer.
BanPeer { peer_id: PeerId, ban_reason: ReasonForBan },
/// Announce account
AnnounceAccount(AnnounceAccount),
AnnounceAccount(AnnounceAccount, bool),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already handled this case in routing. The only situation when we want to force announcement is when we are the originators. Also remove myself from routing table.

State sync was downloading the block at `sync_hash`, but actually needs to download the previous block.
Generally nothing was breaking with downloading at `sync_hash` except that if the next block for some reason cannot be applied or is not known yet, for a while `body_head` was pointing at the non-existing block (specifically the prev of `sync_hash`).

Also added logic to reset the state sync if the header head is two epochs ahead, but this logic doesn't trigger currently because we do not move the header head during state sync. Tracking in #1174
Err(err) => NetworkClientResponses::InvalidTx(err),
}
} else {
NetworkClientResponses::NoResponse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that tx get dropped in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably route it to a validator that is in the shard for this tx.
But if we are doing it already on networking level, here we can just drop - as we are assuming it doesn't arrive here unless it's some edge case (like we just switched the caring shards).

SkidanovAlex and others added 16 commits August 16, 2019 14:59
Also fixing a bug we discovered earlier that if the last block of an epoch doesn't have a chunk for a particular shard, we do not do updated necessary for the epoch switch. Fixing it by forcefully executing `apply_transactions` with no txs and receipts for the last block. Conveniently, with the fix for #1183 it is possible (before we would not be able to store two different post-state roots for the same chunk)
* Re-enable collect block approvals

* disable produce_blocks_with_tx

* test collect block approval

* fix test compilation
The state root in KeyValueRuntime was compute solely based on the account balances, which is not enough. Specifically, if two different sets of transactions lead to the same state, it would map to the same state root, but correspond to different set of transaction/recept nonces used
* Implement kickout for chunk producers

* fix tests

* address comments

* Better error message
* Merkle proofs for receipts

The problem is to make sure that received receipts are trusty.

The common idea is to deliver a pair of (receipts, receipts_proof)
where receipts_proof is a Merkle proof of the receipts are received.

While chunk is producing, we calculate Merkle proof root and put it
into chunk header (check ClientActor::receipts_proof for more details).

Then we should check proofs in two cases:
1. Anyone who cares about shard and applying chunks completely
must compare Merkle root with proof which is stored into chunk header.
2. Sending ChunkOnePart, we must calculate proofs for each subset of
receipts related to some shard, as many as num_shards we have.
Receiving ChunkOnePart, we must check that all proofs are given and
all proofs guarantee that each subset contains in chunk receipts.

TEST PLAN
===
- sanity tests in chain and client folders
- cargo test --all-features --package near-client --test cross_shard_tx tests::test_cross_shard_tx -- --exact --nocapture
(10 iterations)
- cargo test --all-features --package near --test run_nodes run_nodes_2 -- --exact --nocapture
- cargo test --package near --test stake_nodes test_validator_join -- --exact --nocapture

LOGIC REVIEWER
===
Alex Skidanov
* add validator info to query

* fix issues with testnet.json and rpc

* add current proposals in validator info
expensive test to make sure that all OneParts have necessary receipts
- Removing all the "cow mine" tracing
- Replacing "cow mine" asserts with a new macro `byzantine_assert` that triggers if a `byzantine_asserts` feature is enabled
- Disabling nearlib tests for now in CI
ilblackdragon and others added 5 commits August 31, 2019 23:54
* Rework block production

* Fix unit tests (facepalm)

* Fix timing issue. Add cache for latest_known in the store

* Keep assert around processing just produced block

* Move updating latest_known into the store update
* Rework block production

* Fix unit tests (facepalm)

* Fix timing issue. Add cache for latest_known in the store

* State records are linearized and split into shards at genesis. Add protocol for testnet config

* Making near testnet use seed for network id to test easier

* Fixes after merge

* Add reporting in gas from blocks instead of tps

* Fix testnet json
@SkidanovAlex SkidanovAlex marked this pull request as ready for review September 4, 2019 17:20
@ilblackdragon ilblackdragon merged commit 3762990 into staging Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants