Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat(consensus): committees data extractor (BFT-443) #2518

Closed

Conversation

moshababo
Copy link
Contributor

@moshababo moshababo commented Jul 29, 2024

Offshoot of #2416

What ❔

Adding state extractor for committees data, reading from the consensus registry contract to be inserted into l1_batches_consensus DB table.

Why ❔

Reading directly from VM storage (with zksync_node_consensus::storage::registry_contract::VMReader) is inefficient, and is using a (slightly) different data format. A background-running denormalization process would ease ongoing per-batch reads.

Changes

  • Added zksync_node_consensus::storage::registry_contract::CommitteeExtractor
  • Added DB migrations for adding 1 new column to l1_batches_consensus table
  • Added batch_committee() & insert_batch_committee(), get_last_batch_committee_number in consensus_dal
  • Updated sqlx::query of insert_batch_certificate(), get_last_batch_certificate_number in consensus_dal
  • Added DB (de)serialization support
  • Refactored zksync_node_consensus::storage::testonly::VMWriter to support writes across multiple batches
  • Added test_committee_extractor

TODOs

  • zksync_dal/migrations: Update the CHECK constraint on certificate column to support nullable column (the constraint name is automatically generated)
  • Add documentation

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.

@moshababo moshababo changed the base branch from consensus_contracts_reader to main July 29, 2024 02:47
@moshababo moshababo changed the base branch from main to consensus_contracts_reader July 29, 2024 02:48
number: BatchNumber,
validator_committee: ValidatorCommittee,
attester_committee: AttesterCommittee,
) -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to query the last batch with committees/certificate, then keeping them in the same table makes little sense, because we will need extra indices for efficient lookups. Do we really want to go this way?

Copy link
Contributor Author

@moshababo moshababo Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying the last batch with committees is a one-time read on server start, so I didn’t thought it’s a big deal. And I don’t see such an index for the query of the last batch with certificate (unless it’s applied internally via the toolchain).

In the overall, I’m not against separating the tables, and originally suggested that. IIRC @RomanBrodetski thought it’s an overkill, and that having all info in respect to a given batch in one place is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no index, because the the table serves as index for itself when it has just 1 non null column. With 2 columns, it won't be the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we will need extra indices for efficient lookups.

why exactly? select ... order by ID desc where certificate is not null will only scan the rows for which there is committee but not certificate. Given that it happens on startup, it shouldn't be a problem - am I missing smt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no index, because the the table serves as index for itself when it has just 1 non null column. With 2 columns, it won't be the case.

Right, I overlooked that in that case there's no need for WHERE clause.

why exactly? select ... order by ID desc where certificate is not null will only scan the rows for which there is committee but not certificate. Given that it happens on startup, it shouldn't be a problem - am I missing smt?

If column X is NOT NULL, there's no need for WHERE X IS NOT NULL, and SELECT MAX(PK) alone is sufficient.

Since the NULL column is the certificate, and not the committee, this won't happen just on startup. However, I'm not sure about the frequency of this query and whether it justifies an index.

Copy link
Contributor

@aakoshh aakoshh Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I would keep the committee in a separate table. As I understand we are doing this without concern for performance, but there are many ways to store the committee, and some don't involve repeating the entire committee in every batch, only when it changes. To keep our options open for future changes, I would keep them separate from the beginning, rather than start messing around with existing constraints to make room for this in the same table, having to revisit the (luckily few) existing queries where we now have to add filtering, turning inserts into updates.

@moshababo moshababo changed the title [WIP] feat(consensus): committees data denormalizer (BFT-443) [WIP] feat(consensus): committees data extractor (BFT-443) Aug 5, 2024
Ok(Some(zksync_protobuf::serde::deserialize(certificate)?))
}

pub async fn batch_committee(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to state it in docstrings that this returns the committee which should attest to the batch_number, and not the one which is the result of executing the batch, ie. the ones attesting for batch_number+1.

.execute(self.storage)
.await?;
if res.rows_affected().is_zero() {
tracing::debug!(l1_batch_number = ?number, "duplicate batch certificate");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracing::debug!(l1_batch_number = ?number, "duplicate batch certificate");
tracing::debug!(l1_batch_number = ?number, "duplicate batch committee");

@@ -491,6 +552,45 @@ impl ConsensusDal<'_, '_> {
// and doesn't have prunning enabled.
Ok(attester::BatchNumber(0))
}

pub async fn get_last_batch_committee_number(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docstrings to explain what this method is supposed to do? It sounds like "the committee number of the last batch" but there is no such thing. Is it the batch number of the last executed batch, for which a committee is known? Isn't that available somewhere else?

certificate = $2,
updated_at = NOW()
WHERE
l1_batch_number = $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insertion of committees happens asynchronously. In theory the certificate could arrive before the polling loop tries to extract the committee from the VM, in which case this update would not do anything, and the certificate would be lost.

Also the logging that happens as "duplicate certificate" when 0 rows are affected would be untrue, it would apply for a case described above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought this shouldn't happen: before the certificate for batch N can be verified the node needs the committee from the execution of batch N-1, so I assume there is plenty of time between when the committee is extracted from N-1 and inserted for N, especially because we need the commitment for batch N too.

Still it feels like having two tables would avoid any confusion.

let mut next_batch = self.next_batch_to_extract_committee(ctx).await?;
loop {
self.wait_batch(ctx, next_batch).await?;
self.extract_batch_committee(ctx, next_batch).await?;
Copy link
Contributor

@aakoshh aakoshh Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wait_batch calls get_sealed_l1_batch_number which is the last batch in the table, but it's not necessarily executed yet, or is it? What would happen if the contract was called with a number that it doesn't have data for, would it fail, return an empty result, or return something based on results up to the previous block?

tracing::info!("Running zksync_node_consensus::storage::CommitteeExtractor");
let res = scope::run!(ctx, |ctx, s| async {
let mut next_batch = self.next_batch_to_extract_committee(ctx).await?;
loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible at all to hook into the batch execution itself, so instead of polling and eventually loading the data from here to there, the derived data would be inserted within the same transaction?

self.pool
.connection(ctx)
.await?
.insert_batch_committee(ctx, batch_number, db_attester_committee)
Copy link
Contributor

@aakoshh aakoshh Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking my understanding of which batch number we mean here.

I thought the committee in l1_batches_consensus for batch number N means which is the committee that is expected to sign batch N. This is the result of executing batch N-1.

So we call extract_batch_committee with the next_batch that we haven't extracted yet, and then figure out what is the last L2 block included in this batch. We execute the eth_call there to get the committee after the execution of the last L2 block. Then we insert the committee with the same batch number? Shouldn't it be batch_number.inc()?

@pompon0
Copy link
Contributor

pompon0 commented Aug 19, 2024

closing this pr, I'll continue this work in #2684

@pompon0 pompon0 closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants