Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Implement compute usage aggregation and limiting #8805

Merged
merged 4 commits into from
Apr 4, 2023

Conversation

aborg-dev
Copy link
Contributor

@aborg-dev aborg-dev commented Mar 27, 2023

feature: Limit compute usage of a chunk

The commit addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:

@aborg-dev aborg-dev requested a review from a team as a code owner March 27, 2023 09:51
@aborg-dev aborg-dev requested review from akhi3030 and jakmeier March 27, 2023 09:51
@aborg-dev aborg-dev added the A-contract-runtime Area: contract compilation and execution, virtual machines, etc label Mar 27, 2023
@aborg-dev aborg-dev self-assigned this Mar 27, 2023
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit check looks good but I am afraid the change to ExecutionOutcome has unwanted consequences. Please check out my longer comment on it.

core/primitives/src/transaction.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Show resolved Hide resolved
@Longarithm
Copy link
Member

I didn’t realise before that we replace gas limit checks with compute limit checks completely. It makes sense, but shouldn’t we add some runtime check that total compute cost is always > 0, or config check that each separate compute cost is not lower than gas cost? I’m thinking about the case when compute costs are accidentally set too low - as we don’t put restrictions on their values - which can result in very long execution. Also this changes intuition “1 chunk is 1 Pgas” - if we don’t tie compute costs to gas costs, now we have to say “1 chunk is 1 s compute cost”. Perhaps it’s even better.

Also, to clarify - am I right that we don’t count compute costs during wasm execution at all; instead we compute them afterwards to ensure that contracts are not broken?

@aborg-dev
Copy link
Contributor Author

@Longarithm , thank you for the comment! See answer below:

I didn’t realise before that we replace gas limit checks with compute limit checks completely. It makes sense, but shouldn’t we add some runtime check that total compute cost is always > 0, or config check that each separate compute cost is not lower than gas cost? I’m thinking about the case when compute costs are accidentally set too low - as we don’t put restrictions on their values - which can result in very long execution.

The idea behind the compute costs is that they represent the pessimistic estimate of the time it takes to process a particular operation - as long as that is maintained, the time to execute the transactions that fit within the chunk limit will be bound by target 1s. In other words, we need to be careful to set compute costs high enough, just as we need to be careful when setting gas costs, there is no way around that :)

I don't think there is a strict need for compute costs to be higher than the gas cost as long as the property above is satisfied. For example, if some operation is overcharged, it should be safe to set a compute cost for it that is lower than the gas cost (though we probably would decrease the gas cost in that situation as it wouldn't break contracts).

We've actually discussed the case when compute cost could be 0 in the NEP. This is quite a rare case, so I'm open to adding runtime check for this for now, though I don't think it would add much to safety on top of a thorough review of new compute costs, given that all their adjustments will be a protocol change that requires a review.

Also this changes intuition “1 chunk is 1 Pgas” - if we don’t tie compute costs to gas costs, now we have to say “1 chunk is 1 s compute cost”. Perhaps it’s even better.

Definitely, we would need some new vocabulary here, I like your proposal of "1s of compute cost"!

Also, to clarify - am I right that we don’t count compute costs during wasm execution at all; instead we compute them afterwards to ensure that contracts are not broken?

Yes, we don't limit the WASM contract execution based on compute costs and we compute them from profiles after the execution. This is to avoid breaking existing contracts. Here is the implementation #8783.

Comment on lines 419 to 422
/// The amount of compute time spent by the given transaction or receipt.
// TODO(#8859): Treat this field in the same way as `gas_burnt`.
#[borsh_skip]
pub compute_usage: Compute,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use borsh skip, that means when we read an outcome from the DB, it will always be zero, right? This might become problematic for replaying in the state viewer at some point.

Do you think it would be too cumbersome to make it an Option<Compute> to make it clear when the field is initialized and when not? We would probably unwrap it in places where it must be initialized, and ignore it in all other places, so it shouldn't change any actual behavior. But it would prevent future code from accidentally using this uninitialized value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I was initially thinking of backfilling this value using borsh_init [1] and setting it to the burnt_gas, but that would eventually break when we set non-trivial compute costs. Option at least makes it clear that one can't rely on the value if it was read from the database.

[1] https://github.com/near/borsh-rs#features

@aborg-dev aborg-dev force-pushed the compute_costs_impl branch 3 times, most recently from c762a61 to 2b79110 Compare April 3, 2023 09:57
@aborg-dev
Copy link
Contributor Author

@jakmeier , I've added a unit test for this (though one can probably argue that it's an integration test as well), PTAL

Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nitpicking for additional comments and an extra checks in the test :)

(Instead of multiple inline comments for the test, you could also add a single function level comment. Anything that explains how the test works really.)

runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
posvyatokum and others added 4 commits April 4, 2023 10:10
This should be safe to use with already open db.
Not sure about easy, though. home_dir, config, and archive flag are not available everywhere right now without any hustle.
(My bad, didn't realize that the `S-automerge` tag would merge both directions, so this is a separate PR.)

Also a bit of clean-up w.r.t. magic numbers.

Implicit account test no longer has to worry about the `LIQUID_BALANCE_FOR_STORAGE` subaccount.
The commit introduces aggregation of compute usage across all operations
performed during chunk application and limiting of this compute usage
to 1s.

This should not change the behavior of nodes in the short run because
compute costs match gas costs and this is validated by the
`debug_assert`.

There are two follow ups to this work:
- near#8859
- near#8860
@near-bulldozer near-bulldozer bot merged commit 700ec29 into near:master Apr 4, 2023
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Apr 5, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
jakmeier added a commit to jakmeier/nearcore that referenced this pull request Apr 11, 2023
…ar#8805)"

This reverts commit 700ec29.

Canary nodes found an issue with this change:
```
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `2519443987198`,
 right: `2429311750949`: Compute usage must match burnt gas',
```
aborg-dev added a commit to aborg-dev/nearcore that referenced this pull request Apr 11, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
aborg-dev added a commit to aborg-dev/nearcore that referenced this pull request Apr 14, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
aborg-dev added a commit to aborg-dev/nearcore that referenced this pull request Apr 17, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
aborg-dev added a commit to aborg-dev/nearcore that referenced this pull request Apr 18, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
aborg-dev added a commit to aborg-dev/nearcore that referenced this pull request Apr 18, 2023
feature: Limit compute usage of a chunk

The commit addresses near#8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- near#8859
- near#8860
near-bulldozer bot pushed a commit that referenced this pull request Apr 18, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
nikurt pushed a commit that referenced this pull request Apr 18, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
nikurt pushed a commit that referenced this pull request Apr 18, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
nikurt pushed a commit that referenced this pull request Apr 25, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
nikurt pushed a commit that referenced this pull request Apr 25, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
nikurt pushed a commit that referenced this pull request Apr 28, 2023
feature: Limit compute usage of a chunk

The commit addresses #8265  by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the `assert`, so any discrepancy should be caught on canary nodes.

There are two follow-ups to this work:
- #8859
- #8860
nikurt pushed a commit that referenced this pull request Apr 28, 2023
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
@aborg-dev aborg-dev deleted the compute_costs_impl branch June 12, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-contract-runtime Area: contract compilation and execution, virtual machines, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants