Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(consensus): add L2 contracts VM storage reader (BFT-434) #2416

Closed
wants to merge 13 commits into from

Conversation

moshababo
Copy link
Contributor

@moshababo moshababo commented Jul 10, 2024

What ❔

Adding eth_call-based storage reader for consensus L2 contracts.

Why ❔

Enable consensus executor to read from L2 storage per specific block.

Notes

  • consensus::storage::VMReader added to incapsulate read-transactions dispatching against L2 storage.
  • consensus::storage::testonly::VMWriter added in order to simulate contracts deployments and data initialization in unit tests.
  • consensus::storage::tests::test_vm_reader() test added.
  • Multiple visibility modifiers were changed in order to facilitate the new codebase dependency graph.

TODOs

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.

Copy link
Member

@popzxc popzxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high-level comments. I understand that the PR is still in draft, but the comments are mostly about design.

Comment on lines 21 to 25
consensus_authority_contract: Contract,
validator_registry_address: Option<Address>,
validator_registry_contract: Contract,
attester_registry_address: Option<Address>,
attester_registry_contract: Contract,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 3 dedicated contracts for consensus? Our contracts are already very sophisticated, so my fear is that it would complicate them even more.
I do not have concrete proposals, but I strongly suggest discussing the contracts design with @vladbochok and @StanislavBreadless

Not actionable in this PR I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my fault. I did the design for the contracts and thought that smaller contracts would be preferred. They can be merged into a single larger contract though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to contracts here: matter-labs/era-contracts@0c2498f

Comment on lines 36 to 38
let consensus_authority_contract = load_contract("contracts/l2-contracts/artifacts-zk/contracts/ConsensusAuthority.sol/ConsensusAuthority.json");
let validator_registry_contract = load_contract("contracts/l2-contracts/artifacts-zk/contracts/ValidatorRegistry.sol/ValidatorRegistry.json");
let attester_registry_contract = load_contract("contracts/l2-contracts/artifacts-zk/contracts/AttesterRegistry.sol/AttesterRegistry.json");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zksync_contracts crate currently serves (somewhat) as a schema for our FS dependencies, so probably it would be better to load the contracts there.
You can create a module for consensus_l2_contracts and load them via pub fn load_consensus_l2_contracts() -> Option<ConsensusContracts>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: 5a925c6

core/node/consensus/src/storage/vm_reader.rs Outdated Show resolved Hide resolved
Comment on lines 89 to 94
let num_committee_validators = self.read_num_committee_validators(ctx, block_id).await;
for i in 0..num_committee_validators {
let committee_validator = self.read_committee_validator(ctx, block_id, i).await;
let validator = self
.read_validator(ctx, block_id, committee_validator.0)
.await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to instantiate a new DB connection and VM instance for each read request, and it's a somewhat expensive operation. (Actually likely several ones, because tx_sender would also create connections internally).

If we want to actually run the VM, I'd expect us to do it efficiently, roughy:

let vm = self.instantiate_vm().await;
let num_committee_validators = self.read_num_committee_validators(&vm, block_id).await;
// ^ run on an already instantiated VM.
// ... same for the rest of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was just a PoC.

It was fixed in a8779b5, so that only one connection will be acquired per read session. It does not however (yet) re-use that connection within the execution sandbox. The connection is only used to load block args.

Comment on lines 174 to 192
) -> (usize, Vec<u8>, Vec<u8>, bool) {
let func = self
.validator_registry_contract
.function("validators")
.unwrap()
.clone();
let tx = self.gen_l2_call_tx(
self.validator_registry_address.unwrap(),
func.encode_input(&[Token::Address(node_owner)]).unwrap(),
);
let res = self.eth_call(ctx, block_id, tx).await;
let tokens = func.decode_output(&res).unwrap();
(
tokens[0].clone().into_uint().unwrap().as_usize(),
tokens[1].clone().into_bytes().unwrap(),
tokens[2].clone().into_bytes().unwrap(),
tokens[3].clone().into_bool().unwrap(),
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it would be better to implement zksync_types::web3::contract::Tokenize on all the required structures instead of mixing parsing with the business logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here: 477ea5e

Comment on lines 289 to 304
let block_args = zksync_node_api_server::execution_sandbox::BlockArgs::new(
&mut conn,
block_id,
&start_info,
)
.await
.unwrap();
let call_overrides = CallOverrides {
enforced_base_fee: None,
};

let res = self
.tx_sender
.eth_call(block_args, call_overrides, tx)
.await
.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to consider running VM to query contracts (as opposed to pre-calculating storage slots and reading storage directly), I think we need to do quite some preparation work:

  • Extract execution_vm into a separate, general-purpose crate.
  • Change the sandbox so that it can perform multiple requests on a single instance.
  • Decouple it from TxSender.

Right now, tx_sender and dependency on zksync_node_api_server feel like a pretty huge hack, and I'm not sure if it's a good idea to depend on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly see pros of using VM with eth_call, but I believe that implementing it properly would take a lot of engineering efforts. As you can see, the first try is implemented in a hacky way, and making a proper implementation (e.g. detaching sandbox from API) is a complex task even for seasoned platform developers. I would really like to not use hacky ways, as they may backfire, especially given that consensus will be a critical component for the sequencer.

On the other hand, reading storage directly will never look as "good", but it's relatively easy to implement and cover with unit tests (e.g. change value via contract method, then read from storage directly) to make sure that the storage layout doesn't change.

If we can afford to spend time on extracting the sandbox from API, I'm all for it. If not, I would rather prefer direct storage access than hacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the current solution is a workaround/hack as per the code dependency graph and the lack of execution sandbox instance re-use.

However, falling back to direct storage reads is worse IMHO.

The reads are not targeting U256 variables in which their storage slots’ keys can be easily pre-calculated, be fetched and decoded. It involves dynamic-sized arrays, structs, and dynamic-sized arrays within the structs. This implies packing/padding, which means to re-implement the VM codec.

Meanwhile, I couldn’t find any relevant documentation, and there’s very limited code examples (mostly in zksync_types::utils), for much simpler use-cases.

I went ahead to experiment with it, and couldn’t even retrieve a simple U256 variable slot, following canonical solc storage layout. I might have overlooked something, but couldn’t reverse-engineer the slots back to the pre-hashed keys. In the overall, I don’t think it’s a good idea to go forward with this given the complex data types involved.

Also, I don’t think such approach is testable, because the layout is supposedly an implementation detail, unless there’s a clear open available specs, backwards compatibility, and lack of potential optimizations.

Read-transactions are the standard way to read from the VM storage database, while incapsulating the layout and the codec. The consensus component functions as a standard web3 client in this regard, and having direct access to the database does not imply it should use it to bypass the standard interface.

I suggest to continue with the current approach, and to implement your suggestions for the execution sandbox decoupling, in a follow-up PR. (see BFT-493)

Copy link
Member

@brunoffranca brunoffranca Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. That was also the protocol teams's opinion, that we should avoid direct storage reads if at all possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind as long as refactoring will really be planned & executed.

@brunoffranca brunoffranca self-requested a review July 10, 2024 12:13
@brunoffranca brunoffranca changed the title [WIP] Add consensus L2 contracts VM storage reader [WIP] Add consensus L2 contracts VM storage reader (BFT-434) Jul 10, 2024
@moshababo moshababo changed the title [WIP] Add consensus L2 contracts VM storage reader (BFT-434) Add consensus L2 contracts VM storage reader (BFT-434) Jul 22, 2024
@moshababo moshababo marked this pull request as ready for review July 22, 2024 00:22
@moshababo moshababo changed the title Add consensus L2 contracts VM storage reader (BFT-434) feat(consensus): Add L2 contracts VM storage reader (BFT-434) Jul 22, 2024
@moshababo moshababo changed the title feat(consensus): Add L2 contracts VM storage reader (BFT-434) feat(consensus): add L2 contracts VM storage reader (BFT-434) Jul 22, 2024
Comment on lines +3 to +4
url = https://github.com/matter-labs/era-contracts
branch = consensus_contracts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI: I don't think the PR is mergeable until we can merge the contracts to main (e.g. we cannot live with a separate branch in submodules).

I think we should start a discussion with the protocol team on how to merge these changes to main without having to finish/audit them, and meanwhile minimize risks of them somehow affecting production environments.

I have a few ideas, let's discuss in Slack. cc @RomanBrodetski @brunoffranca

core/node/consensus/src/storage/testonly.rs Outdated Show resolved Hide resolved
core/node/consensus/src/storage/vm_reader.rs Outdated Show resolved Hide resolved
core/node/consensus/src/storage/vm_reader.rs Outdated Show resolved Hide resolved
Comment on lines +191 to +192
self.tx_sender
.eth_call(block_args, call_overrides, tx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion from this comment.
tx_sender.eth_call will instantiate a connection, then VM, fetch a bunch of stuff from Postgres, and only then perform an eth_call. So e.g. in read_attester_committee method above you will create attester_committee_size + 1 connections sequentially, with quite some overhead.
I'm still concerned that the performance here may be suboptimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, connections are acquired per-call by the execution sandbox. This is indeed suboptimal, and should be part of the needed refactoring. (BFT-493)

/// A struct for reading data from consensus L2 contracts.
pub struct VMReader {
pool: ConnectionPool,
tx_sender: TxSender,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general observation, not sure if it's a concern. TxSender uses replica DB to read state, so there are a few problems here:

  1. Replication lag may slow the consensus down.
  2. Replica DBs are used by API, and the load there is much higher, so the latency may be less predictable than with main DB.

I guess this can be "quickly solved" by pointing replica DB url to main DB on the components where core runs, but it feels even more of a hack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once execution sandbox will be decoupled, we could use it with main DB connection pool, and no changes to TxSender would be required.

@@ -1,18 +1,24 @@
//! Storage test helpers.

use anyhow::Context as _;
use zksync_concurrency::{ctx, error::Wrap as _, time};
use zksync_concurrency::{ctx, ctx::Ctx, error::Wrap as _, time};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't import ctx::Ctx, use the qualified name.

&mut self,
ctx: &Ctx,
owner: Address,
nodes: &[&[Token]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the conversion be done inside the function, passing &[&[Token]] is very loosely typed.

.function("setAttesterCommittee")
.unwrap()
.short_signature()
.to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these methods wrapping raw contract API deserve a dedicated module.

Comment on lines +235 to +239
let tx = self.gen_tx_add(&registry_contract.contract, deploy_tx.address, node);
txs.push(tx);
}
txs.push(
self.gen_tx_set_validator_committee(deploy_tx.address, &registry_contract.contract),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any rason why the contract and the address parameters are in opposite order?

contract: &Contract,
) -> Transaction {
let calldata = contract
.function("setValidatorCommittee")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see this method in matter-labs/era-contracts#555 ; is it commitValidators ?

U256::from(rng.gen::<usize>()),
(0..256).map(|_| rng.gen()).collect::<Vec<u8>>(),
)
.into_tokens();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could use abigen! to generate bindings for all the Solidity structs and then you get the tokenization with static typing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't migrate to ethers just yet (and in fact it's already deprecated in favor of alloy), so using it is not recommended for now. We want the migration to be done consistently, otherwise, we'll end up with a mix of our vendored web3 remains and some of the newer stuff.

Comment on lines +118 to +131
let func = self
.registry_contract
.function("validatorCommitteeSize")
.unwrap()
.clone();

let tx = self.gen_l2_call_tx(self.registry_address, func.short_signature().to_vec());

let res = self.eth_call(block_args, tx).await;

let tokens = func.decode_output(&res).context("decode_output()")?;
U256::from_tokens(tokens)
.context("U256::from_tokens()")
.map(|t| t.as_usize())
Copy link
Contributor

@aakoshh aakoshh Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be feasible to do a bit more encapsulation? For example to create a DeployedContract or similar type which has an Address and a Contract, then wrap that into a RegistryContract which has methods for these that take whatever is needed to do the eth_call but otherwise know which function to call and how to parse the result? Maybe we can use the abigen! macro to generate the bindings if the ABI JSON is available at compile time?

The VMReader sounds more general than just having all these methods for this particular contract, even if it's the only one at the moment.

I did something similar here where a gateway contract implemented multiple facets, each with their own binding, each represented by a contract caller that took the equivalent of the TxSender as an input. I suppose that gateway is similar to the VMReader, but a lot of the boilerplate and the private methods here could be delegated to contract specific structs, leaving only the highest level stuff in the reader.

let tx = self.gen_l2_call_tx(
self.registry_address,
func.encode_input(&zksync_types::U256::from(idx).into_tokens())
.unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unwraps could be at least expect with the method name to give us some context if they fail, in case we can't use bindings.

let func = self
.registry_contract
.function("attesterCommitteeSize")
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A contract wrapper could provide a method to get the function by name and make sure if it's not found the error explains what it was looking for.

pub pop: Vec<u8>,
}

impl Detokenize for CommitteeValidator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an attribute in ethers that should work here even without generating the bindings. Here's an example.

@pompon0
Copy link
Contributor

pompon0 commented Aug 19, 2024

closing this pr, I'll continue this work in #2684

@pompon0 pompon0 closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants