feat: stateless validation jobs in test mode #10248

Longarithm · 2023-11-24T19:21:20Z

This is a next step for #9982.
Here I introduce jobs which will perform stateless validation of newly received chunk by executing txs and receipts.
Later they should be executed against state witness, but for now I just set a foundation by running these jobs against state data in DB. All passing tests verify that old and new jobs generate the same result.
The final switch will happen when stateful jobs will be replaced with stateless ones.

Details

This doesn't introduce any load on stable version. On nightly version there will be num_shards extra jobs which will check that stateless validation results are consistent with stateful execution. But as we use nightly only for testing, it shouldn't mean much overhead.

I add more fields to ShardContext structure to simplify code. Some of them are needed to break early if there is resharding, and the logic is the same for both kinds of jobs.

StorageDataSource::DbTrieOnly is introduced to read data only from trie in stateless jobs. This is annoying but still needed if there are a lot of missing chunks and flat storage head moved above the block at which previous chunk was created. When state witness will be implemented, Recorded will be used instead.

Testing

Failure to update current_chunk_extra on the way leads to >20 tests failing in process_blocks, with errors like assertion left == right failed: For stateless validation, chunk extras for block CMV88CBcnKoxa7eTnkG64psLoJzpW9JeAhFrZBVv6zDc and shard s3.v2 do not match...
If I update current_chunk_extra only once, tests::client::resharding::test_latest_protocol_missing_chunks_high_missing_prob fails which was specifically introduced for that. Actually this helped to realize that validate_chunk_with_chunk_extra is still needed but I will introduce it later.
Nayduck: ~~https://nayduck.near.org/#/run/3293 - +10 nightly tests failing, will take a look~~ https://nayduck.near.org/#/run/3300

pugachAG

overall everything makes sense, added a couple of suggestions

pugachAG · 2023-11-27T08:39:37Z

chain/chain/src/chain.rs

            })
-            .collect()
+            .collect::<Result<Vec<Vec<UpdateShardJob>>, Error>>()?


using functional style iterator at this point only makes it more confusing, I suggest rewriting this using plain for loop

chain/chain/src/update_shard.rs

pugachAG · 2023-11-27T11:01:07Z

chain/chain/src/update_shard.rs

@@ -232,8 +252,8 @@ fn apply_old_chunk(

    let storage_config = RuntimeStorageConfig {
        state_root: *prev_chunk_extra.state_root(),
-        use_flat_storage: true,
-        source: crate::types::StorageDataSource::Db,
+        use_flat_storage: shard_context.storage_data_source == StorageDataSource::Db,


I suggest to keep use_flat_storage: true here and move this logic as part of apply_transactions implementation:

StorageDataSource::DbTrieOnly => { let mut trie = self.get_trie_for_shard( shard_id, prev_block_hash, storage_config.state_root, false, ); trie.dont_charge_gas_for_trie_node_access(); trie; }

just to avoid duplication and changing it back after we have state witness

Longarithm · 2023-11-28T20:26:47Z

I think I understood why some tests failed in Nayduck - because I didn't comment not supported cases well enough. New attempt is here: https://nayduck.near.org/#/run/3298

dsuggs-near · 2023-12-04T20:36:55Z

I've turned off automatic merging altogether until I understand this problem.

Longarithm · 2023-12-11T04:46:43Z

New logic also breaks hidden assumption that order of jobs must correspond to range of shard ids, which is not desired anyway, so I remove this assumption by returning shard id with a job explicitly.

akhi3030 · 2023-12-11T10:01:30Z

I am not planning on reviewing this PR so removing my review request. Let me know if my input is actually needed.

pugachAG

LGTM!

pugachAG · 2023-12-11T12:07:45Z

chain/chain/src/update_shard.rs

+
+/// Result for a shard update for a single block.
+#[derive(Debug)]
+pub enum ShardBlockUpdateResult {


nit: maybe ShardChunkUpdateResult would be more precise

I chose ShardBlockUpdateResult explicitly because chunk is not necessarily present. Especially if there is a state split. So I'll bet on that for now

pugachAG · 2023-12-11T12:09:08Z

chain/chain/src/update_shard.rs

@@ -106,15 +131,25 @@ pub(crate) fn process_shard_update(
    epoch_manager: &dyn EpochManagerAdapter,
    shard_update_reason: ShardUpdateReason,
    shard_context: ShardContext,
-    state_patch: SandboxStatePatch,
-) -> Result<ApplyChunkResult, Error> {
+    storage_context: StorageContext,


nit: I would move storage_context to NewChunkData and OldChunkData since it is only needed there

pugachAG · 2023-12-11T12:13:42Z

core/store/src/flat/manager.rs

@@ -115,7 +115,7 @@ impl FlatStorageManager {
                            ?new_flat_head,
                            ?err,
                            ?shard_uid,
-                            block_hash = ?block.header().hash(),
+                            // block_hash = ?block.header().hash(),


pugachAG · 2023-12-11T12:22:04Z

chain/chain/src/chain.rs

+            let state_patch = state_patch.take();
+
+            // Insert job into list of all jobs to run, if job is valid.
+            let mut insert_job = |job: Result<Option<UpdateShardJob>, Error>| {


nit: this looks weird 🙃 can we just accumulate job_results: Vec<Result<Option<UpdateShardJob>, Error>> and then extract jobs and invalid_chunks after this loop?

codecov · 2023-12-11T14:40:06Z

Codecov Report

Attention: 53 lines in your changes are missing coverage. Please review.

Comparison is base (5ce0d8a) 71.83% compared to head (c694b0e) 71.83%.

Files	Patch %	Lines
chain/chain/src/chain.rs	85.57%	19 Missing and 27 partials ⚠️
core/primitives/src/challenge.rs	0.00%	3 Missing ⚠️
chain/chain/src/update_shard.rs	95.83%	1 Missing ⚠️
chain/client/src/client_actor.rs	0.00%	1 Missing ⚠️
core/store/src/trie/mod.rs	75.00%	0 Missing and 1 partial ⚠️
nearcore/src/runtime/mod.rs	87.50%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #10248    +/-   ##
========================================
  Coverage   71.83%   71.83%            
========================================
  Files         712      712            
  Lines      143171   143442   +271     
  Branches   143171   143442   +271     
========================================
+ Hits       102843   103043   +200     
- Misses      35627    35669    +42     
- Partials     4701     4730    +29

Flag	Coverage Δ
backward-compatibility	`0.08% <0.00%> (-0.01%)`	⬇️
db-migration	`0.08% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.26% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.24% <85.39%> (+0.15%)`	⬆️
linux	`71.59% <33.33%> (-0.12%)`	⬇️
linux-nightly	`71.40% <85.39%> (-0.01%)`	⬇️
macos	`55.10% <31.95%> (-0.66%)`	⬇️
pytests	`1.49% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.28% <0.00%> (-0.01%)`	⬇️
unittests	`68.21% <82.64%> (+<0.01%)`	⬆️
upgradability	`0.13% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stateless jobs

81fcb86

Longarithm force-pushed the new-jobs branch from 39f895b to 81fcb86 Compare November 24, 2023 20:54

Longarithm added 3 commits November 24, 2023 22:54

protocol version?

560610a

no print

bb3d167

assert

ad638cf

pugachAG reviewed Nov 27, 2023

View reviewed changes

Longarithm added 3 commits November 29, 2023 16:34

return none

34e4427

Merge branch 'master' into new-jobs

5d200f3

fix

74d80ed

dsuggs-near closed this Dec 4, 2023

Longarithm reopened this Dec 8, 2023

Longarithm added 4 commits December 7, 2023 20:35

Merge branch 'master' into new-jobs

cdc345e

clippy

254d256

reorg

21943ae

comment

a119c95

Longarithm marked this pull request as ready for review December 8, 2023 03:45

Longarithm requested a review from a team as a code owner December 8, 2023 03:45

Longarithm requested a review from akhi3030 December 8, 2023 03:45

Longarithm changed the title ~~draft: stateless validation jobs in test mode~~ feat: stateless validation jobs in test mode Dec 8, 2023

Longarithm requested a review from pugachAG December 8, 2023 03:48

Longarithm added 3 commits December 7, 2023 21:49

no derive

da4b225

Merge branch 'master' into new-jobs

4f6612d

fix tests, introduce shard id

c75292d

Longarithm force-pushed the new-jobs branch from df76410 to c75292d Compare December 11, 2023 04:40

remove debug

f8ca02b

akhi3030 removed their request for review December 11, 2023 10:01

pugachAG approved these changes Dec 11, 2023

View reviewed changes

Longarithm and others added 2 commits December 11, 2023 19:15

apply suggestions

ad6d2b0

Merge branch 'master' into new-jobs

c694b0e

Longarithm enabled auto-merge December 12, 2023 00:18

Longarithm added this pull request to the merge queue Dec 12, 2023

Merged via the queue into near:master with commit 5595df6 Dec 12, 2023
19 checks passed

Longarithm deleted the new-jobs branch December 12, 2023 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stateless validation jobs in test mode #10248

feat: stateless validation jobs in test mode #10248

Longarithm commented Nov 24, 2023 •

edited

Loading

pugachAG left a comment

pugachAG Nov 27, 2023

pugachAG Nov 27, 2023

Longarithm commented Nov 28, 2023

dsuggs-near commented Dec 4, 2023

Longarithm commented Dec 11, 2023

akhi3030 commented Dec 11, 2023

pugachAG left a comment

pugachAG Dec 11, 2023

Longarithm Dec 11, 2023

pugachAG Dec 11, 2023

pugachAG Dec 11, 2023

Longarithm Dec 11, 2023

pugachAG Dec 11, 2023

codecov bot commented Dec 11, 2023 •

edited

Loading

feat: stateless validation jobs in test mode #10248

feat: stateless validation jobs in test mode #10248

Conversation

Longarithm commented Nov 24, 2023 • edited Loading

Details

Testing

pugachAG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Longarithm commented Nov 28, 2023

dsuggs-near commented Dec 4, 2023

Longarithm commented Dec 11, 2023

akhi3030 commented Dec 11, 2023

pugachAG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2023 • edited Loading

Codecov Report

Longarithm commented Nov 24, 2023 •

edited

Loading

codecov bot commented Dec 11, 2023 •

edited

Loading