[WIP] feat(consensus): committees data extractor (BFT-443) #2518

moshababo · 2024-07-29T02:45:58Z

Offshoot of #2416

What ❔

Adding state extractor for committees data, reading from the consensus registry contract to be inserted into l1_batches_consensus DB table.

Why ❔

Reading directly from VM storage (with zksync_node_consensus::storage::registry_contract::VMReader) is inefficient, and is using a (slightly) different data format. A background-running denormalization process would ease ongoing per-batch reads.

Changes

Added zksync_node_consensus::storage::registry_contract::CommitteeExtractor
Added DB migrations for adding 1 new column to l1_batches_consensus table
Added batch_committee() & insert_batch_committee(), get_last_batch_committee_number in consensus_dal
Updated sqlx::query of insert_batch_certificate(), get_last_batch_certificate_number in consensus_dal
Added DB (de)serialization support
Refactored zksync_node_consensus::storage::testonly::VMWriter to support writes across multiple batches
Added test_committee_extractor

TODOs

zksync_dal/migrations: Update the CHECK constraint on certificate column to support nullable column (the constraint name is automatically generated)
Add documentation

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted via zk fmt and zk lint.

core/lib/dal/migrations/20240725194916_l1_batches_consensus_committees.up.sql

core/lib/dal/src/consensus_dal.rs

pompon0 · 2024-07-29T14:00:45Z

core/lib/dal/src/consensus_dal.rs

+        number: BatchNumber,
+        validator_committee: ValidatorCommittee,
+        attester_committee: AttesterCommittee,
+    ) -> anyhow::Result<()> {


if we are going to query the last batch with committees/certificate, then keeping them in the same table makes little sense, because we will need extra indices for efficient lookups. Do we really want to go this way?

Querying the last batch with committees is a one-time read on server start, so I didn’t thought it’s a big deal. And I don’t see such an index for the query of the last batch with certificate (unless it’s applied internally via the toolchain).

In the overall, I’m not against separating the tables, and originally suggested that. IIRC @RomanBrodetski thought it’s an overkill, and that having all info in respect to a given batch in one place is useful.

there is no index, because the the table serves as index for itself when it has just 1 non null column. With 2 columns, it won't be the case.

because we will need extra indices for efficient lookups.

why exactly? select ... order by ID desc where certificate is not null will only scan the rows for which there is committee but not certificate. Given that it happens on startup, it shouldn't be a problem - am I missing smt?

there is no index, because the the table serves as index for itself when it has just 1 non null column. With 2 columns, it won't be the case.

Right, I overlooked that in that case there's no need for WHERE clause.

why exactly? select ... order by ID desc where certificate is not null will only scan the rows for which there is committee but not certificate. Given that it happens on startup, it shouldn't be a problem - am I missing smt?

If column X is NOT NULL, there's no need for WHERE X IS NOT NULL, and SELECT MAX(PK) alone is sufficient.

Since the NULL column is the certificate, and not the committee, this won't happen just on startup. However, I'm not sure about the frequency of this query and whether it justifies an index.

FWIW I would keep the committee in a separate table. As I understand we are doing this without concern for performance, but there are many ways to store the committee, and some don't involve repeating the entire committee in every batch, only when it changes. To keep our options open for future changes, I would keep them separate from the beginning, rather than start messing around with existing constraints to make room for this in the same table, having to revisit the (luckily few) existing queries where we now have to add filtering, turning inserts into updates.

core/node/consensus/src/storage/connection.rs

core/node/consensus/src/storage/denormalizer.rs

core/node/consensus/src/storage/mod.rs

…ficate_number`

aakoshh · 2024-08-14T00:31:44Z

core/lib/dal/src/consensus_dal.rs

+        Ok(Some(zksync_protobuf::serde::deserialize(certificate)?))
+    }
+
+    pub async fn batch_committee(


It would be great to state it in docstrings that this returns the committee which should attest to the batch_number, and not the one which is the result of executing the batch, ie. the ones attesting for batch_number+1.

aakoshh · 2024-08-14T00:35:51Z

core/lib/dal/src/consensus_dal.rs

+        .execute(self.storage)
+        .await?;
+        if res.rows_affected().is_zero() {
+            tracing::debug!(l1_batch_number = ?number, "duplicate batch certificate");


Suggested change

tracing::debug!(l1_batch_number = ?number, "duplicate batch certificate");

tracing::debug!(l1_batch_number = ?number, "duplicate batch committee");

aakoshh · 2024-08-14T00:39:00Z

core/lib/dal/src/consensus_dal.rs

@@ -491,6 +552,45 @@ impl ConsensusDal<'_, '_> {
        // and doesn't have prunning enabled.
        Ok(attester::BatchNumber(0))
    }
+
+    pub async fn get_last_batch_committee_number(


Can you add some docstrings to explain what this method is supposed to do? It sounds like "the committee number of the last batch" but there is no such thing. Is it the batch number of the last executed batch, for which a committee is known? Isn't that available somewhere else?

aakoshh · 2024-08-14T00:53:19Z

core/lib/dal/src/consensus_dal.rs

+                certificate = $2,
+                updated_at = NOW()
+            WHERE
+                l1_batch_number = $1


The insertion of committees happens asynchronously. In theory the certificate could arrive before the polling loop tries to extract the committee from the VM, in which case this update would not do anything, and the certificate would be lost.

Also the logging that happens as "duplicate certificate" when 0 rows are affected would be untrue, it would apply for a case described above.

On second thought this shouldn't happen: before the certificate for batch N can be verified the node needs the committee from the execution of batch N-1, so I assume there is plenty of time between when the committee is extracted from N-1 and inserted for N, especially because we need the commitment for batch N too.

Still it feels like having two tables would avoid any confusion.

aakoshh · 2024-08-14T01:00:05Z

core/node/consensus/src/storage/registry_contract/committee_extractor.rs

+            let mut next_batch = self.next_batch_to_extract_committee(ctx).await?;
+            loop {
+                self.wait_batch(ctx, next_batch).await?;
+                self.extract_batch_committee(ctx, next_batch).await?;


So wait_batch calls get_sealed_l1_batch_number which is the last batch in the table, but it's not necessarily executed yet, or is it? What would happen if the contract was called with a number that it doesn't have data for, would it fail, return an empty result, or return something based on results up to the previous block?

aakoshh · 2024-08-14T01:01:18Z

core/node/consensus/src/storage/registry_contract/committee_extractor.rs

+        tracing::info!("Running zksync_node_consensus::storage::CommitteeExtractor");
+        let res = scope::run!(ctx, |ctx, s| async {
+            let mut next_batch = self.next_batch_to_extract_committee(ctx).await?;
+            loop {


Would it be possible at all to hook into the batch execution itself, so instead of polling and eventually loading the data from here to there, the derived data would be inserted within the same transaction?

aakoshh · 2024-08-14T11:32:01Z

core/node/consensus/src/storage/registry_contract/committee_extractor.rs

+        self.pool
+            .connection(ctx)
+            .await?
+            .insert_batch_committee(ctx, batch_number, db_attester_committee)


Just checking my understanding of which batch number we mean here.

I thought the committee in l1_batches_consensus for batch number N means which is the committee that is expected to sign batch N. This is the result of executing batch N-1.

So we call extract_batch_committee with the next_batch that we haven't extracted yet, and then figure out what is the last L2 block included in this batch. We execute the eth_call there to get the committee after the execution of the last L2 block. Then we insert the committee with the same batch number? Shouldn't it be batch_number.inc()?

pompon0 · 2024-08-19T15:36:01Z

closing this pr, I'll continue this work in #2684

Add committees data denormalizer

4bb671a

moshababo changed the base branch from consensus_contracts_reader to main July 29, 2024 02:47

moshababo changed the base branch from main to consensus_contracts_reader July 29, 2024 02:48