feature: Implement compute usage aggregation and limiting #8805

aborg-dev · 2023-03-27T09:51:15Z

feature: Limit compute usage of a chunk

The commit addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:

runtime/runtime/src/lib.rs

jakmeier

The limit check looks good but I am afraid the change to ExecutionOutcome has unwanted consequences. Please check out my longer comment on it.

core/primitives/src/transaction.rs

runtime/runtime/src/lib.rs

Longarithm · 2023-03-28T13:30:12Z

I didn’t realise before that we replace gas limit checks with compute limit checks completely. It makes sense, but shouldn’t we add some runtime check that total compute cost is always > 0, or config check that each separate compute cost is not lower than gas cost? I’m thinking about the case when compute costs are accidentally set too low - as we don’t put restrictions on their values - which can result in very long execution. Also this changes intuition “1 chunk is 1 Pgas” - if we don’t tie compute costs to gas costs, now we have to say “1 chunk is 1 s compute cost”. Perhaps it’s even better.

Also, to clarify - am I right that we don’t count compute costs during wasm execution at all; instead we compute them afterwards to ensure that contracts are not broken?

aborg-dev · 2023-03-28T16:23:25Z

@Longarithm , thank you for the comment! See answer below:

I didn’t realise before that we replace gas limit checks with compute limit checks completely. It makes sense, but shouldn’t we add some runtime check that total compute cost is always > 0, or config check that each separate compute cost is not lower than gas cost? I’m thinking about the case when compute costs are accidentally set too low - as we don’t put restrictions on their values - which can result in very long execution.

The idea behind the compute costs is that they represent the pessimistic estimate of the time it takes to process a particular operation - as long as that is maintained, the time to execute the transactions that fit within the chunk limit will be bound by target 1s. In other words, we need to be careful to set compute costs high enough, just as we need to be careful when setting gas costs, there is no way around that :)

I don't think there is a strict need for compute costs to be higher than the gas cost as long as the property above is satisfied. For example, if some operation is overcharged, it should be safe to set a compute cost for it that is lower than the gas cost (though we probably would decrease the gas cost in that situation as it wouldn't break contracts).

We've actually discussed the case when compute cost could be 0 in the NEP. This is quite a rare case, so I'm open to adding runtime check for this for now, though I don't think it would add much to safety on top of a thorough review of new compute costs, given that all their adjustments will be a protocol change that requires a review.

Also this changes intuition “1 chunk is 1 Pgas” - if we don’t tie compute costs to gas costs, now we have to say “1 chunk is 1 s compute cost”. Perhaps it’s even better.

Definitely, we would need some new vocabulary here, I like your proposal of "1s of compute cost"!

Also, to clarify - am I right that we don’t count compute costs during wasm execution at all; instead we compute them afterwards to ensure that contracts are not broken?

Yes, we don't limit the WASM contract execution based on compute costs and we compute them from profiles after the execution. This is to avoid breaking existing contracts. Here is the implementation #8783.

jakmeier · 2023-03-31T13:44:53Z

core/primitives/src/transaction.rs

+    /// The amount of compute time spent by the given transaction or receipt.
+    // TODO(#8859): Treat this field in the same way as `gas_burnt`.
+    #[borsh_skip]
+    pub compute_usage: Compute,


If we use borsh skip, that means when we read an outcome from the DB, it will always be zero, right? This might become problematic for replaying in the state viewer at some point.

Do you think it would be too cumbersome to make it an Option<Compute> to make it clear when the field is initialized and when not? We would probably unwrap it in places where it must be initialized, and ignore it in all other places, so it shouldn't change any actual behavior. But it would prevent future code from accidentally using this uninitialized value.

That's a good idea. I was initially thinking of backfilling this value using borsh_init [1] and setting it to the burnt_gas, but that would eventually break when we set non-trivial compute costs. Option at least makes it clear that one can't rely on the value if it was read from the database.

[1] https://github.com/near/borsh-rs#features

aborg-dev · 2023-04-03T12:53:13Z

@jakmeier , I've added a unit test for this (though one can probably argue that it's an integration test as well), PTAL

jakmeier

LGTM, just some nitpicking for additional comments and an extra checks in the test :)

(Instead of multiple inline comments for the test, you could also add a single function level comment. Anything that explains how the test works really.)

runtime/runtime/src/lib.rs

This should be safe to use with already open db. Not sure about easy, though. home_dir, config, and archive flag are not available everywhere right now without any hustle.

(My bad, didn't realize that the `S-automerge` tag would merge both directions, so this is a separate PR.) Also a bit of clean-up w.r.t. magic numbers. Implicit account test no longer has to worry about the `LIQUID_BALANCE_FOR_STORAGE` subaccount.

The commit introduces aggregation of compute usage across all operations performed during chunk application and limiting of this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs and this is validated by the `debug_assert`. There are two follow ups to this work: - near#8859 - near#8860

feature: Limit compute usage of a chunk The commit addresses near#8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes. There are two follow-ups to this work: - near#8859 - near#8860

…ar#8805)" This reverts commit 700ec29. Canary nodes found an issue with this change: ``` thread '<unnamed>' panicked at 'assertion failed: `(left == right)` left: `2519443987198`, right: `2429311750949`: Compute usage must match burnt gas', ```

feature: Limit compute usage of a chunk The commit addresses near#8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes. There are two follow-ups to this work: - near#8859 - near#8860

This PR re-introduces changes from #8805 together with #8892 after fixing #8908: The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.

feature: Limit compute usage of a chunk The commit addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes. There are two follow-ups to this work: - #8859 - #8860

This PR re-introduces changes from #8805 together with #8892 after fixing #8908: The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s. This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.

aborg-dev requested a review from a team as a code owner March 27, 2023 09:51

aborg-dev requested review from akhi3030 and jakmeier March 27, 2023 09:51

aborg-dev added the A-contract-runtime Area: contract compilation and execution, virtual machines, etc label Mar 27, 2023

aborg-dev self-assigned this Mar 27, 2023

aborg-dev commented Mar 27, 2023

View reviewed changes

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

aborg-dev commented Mar 27, 2023

View reviewed changes

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

jakmeier requested changes Mar 27, 2023

View reviewed changes

core/primitives/src/transaction.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Show resolved Hide resolved

aborg-dev mentioned this pull request Mar 31, 2023

Make compute_usage a first-class citizen in ExecutionOutcome #8860

Open

2 tasks

jakmeier reviewed Mar 31, 2023

View reviewed changes

aborg-dev force-pushed the compute_costs_impl branch 3 times, most recently from c762a61 to 2b79110 Compare April 3, 2023 09:57

jakmeier approved these changes Apr 3, 2023

View reviewed changes

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved

aborg-dev force-pushed the compute_costs_impl branch from 1664b7a to bc17df6 Compare April 4, 2023 10:01

aborg-dev added the S-automerge label Apr 4, 2023

posvyatokum and others added 4 commits April 4, 2023 10:10

feat: create NodeStorage from checkpoint (near#8876)

5e174de

This should be safe to use with already open db. Not sure about easy, though. home_dir, config, and archive flag are not available everywhere right now without any hustle.

Run formatter

3940988

near-bulldozer bot merged commit 700ec29 into near:master Apr 4, 2023

This was referenced Apr 4, 2023

Write parameter weight tests #8266

Closed

Update condition for a full chunk to respect parameter weights #8265

Closed

aborg-dev mentioned this pull request Apr 17, 2023

feature: Re-introduce Compute Costs #8915

Merged

aborg-dev deleted the compute_costs_impl branch June 12, 2023 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Implement compute usage aggregation and limiting #8805

feature: Implement compute usage aggregation and limiting #8805

aborg-dev commented Mar 27, 2023 •

edited

Loading

jakmeier left a comment

Longarithm commented Mar 28, 2023

aborg-dev commented Mar 28, 2023

jakmeier Mar 31, 2023

aborg-dev Apr 3, 2023

aborg-dev commented Apr 3, 2023

jakmeier left a comment

feature: Implement compute usage aggregation and limiting #8805

feature: Implement compute usage aggregation and limiting #8805

Conversation

aborg-dev commented Mar 27, 2023 • edited Loading

jakmeier left a comment

Choose a reason for hiding this comment

Longarithm commented Mar 28, 2023

aborg-dev commented Mar 28, 2023

jakmeier Mar 31, 2023

Choose a reason for hiding this comment

aborg-dev Apr 3, 2023

Choose a reason for hiding this comment

aborg-dev commented Apr 3, 2023

jakmeier left a comment

Choose a reason for hiding this comment

aborg-dev commented Mar 27, 2023 •

edited

Loading