Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Peerdas DO NOT MERGE] CI Tests before rebase #14656

Draft
wants to merge 84 commits into
base: develop
Choose a base branch
from

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Nov 22, 2024

No description provided.

@nalepae nalepae force-pushed the peerDAS-do-not-merge branch 2 times, most recently from 5555105 to e92cde5 Compare November 22, 2024 11:14
@nalepae nalepae changed the title [Peerdsa DO NOT MERGE] CI Tests before rebase [Peerdas DO NOT MERGE] CI Tests before rebase Nov 22, 2024
@nalepae nalepae force-pushed the peerDAS-do-not-merge branch 2 times, most recently from 74cf5e4 to 2dd4302 Compare November 25, 2024 13:39
nalepae and others added 24 commits November 27, 2024 10:11
* Add Support For Discovery Of Column Subnets

* Lint for SubnetsPerNode

* Manu's Review

* Change to a better name
* Add Data Column Subscriber

* Add Data Column Vaidator

* Wire all Handlers In

* Fix Build

* Fix Test

* Fix IP in Test

* Fix IP in Test
* Add RPC Handler

* Add Column Requests

* Update beacon-chain/db/filesystem/blob.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Update beacon-chain/p2p/rpc_topic_mappings.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Manu's Review

* Manu's Review

* Interface Fixes

* mock manager

---------

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Bump `c-kzg-4844` lib to the `das` branch.

* Implement `MerkleProofKZGCommitments`.

* Implement `das-core.md`.

* Use `peerdas.CustodyColumnSubnets` and `peerdas.CustodyColumns`.

* `CustodyColumnSubnets`: Include `i` in the for loop.

* Remove `computeSubscribedColumnSubnet`.

* Remove `peerdas.CustodyColumns` out of the for loop.
* Remove capital letter from error messages.

* `[4]byte` => `[fieldparams.VersionLength]byte`.

* Prometheus: Remove extra `committee`.

They are probably due to a bad copy/paste.

Note: The name of the probe itself is remaining,
to ensure backward compatibility.

* Implement Proposer RPC for data columns.

* Fix TestProposer_ProposeBlock_OK test.

* Remove default peerDAS activation.

* `validateDataColumn`: Workaround to return a `VerifiedRODataColumn`
* Add new DA check

* Exit early in the event no commitments exist.

* Gazelle

* Fix Mock Broadcaster

* Fix Test Setup

* Update beacon-chain/blockchain/process_block.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Manu's Review

* Fix Build

---------

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Update `consensus_spec_version` to `v1.5.0-alpha.1`.

* `CustodyColumns`: Fix and implement spec tests.

* Make deepsource happy.

* `^uint64(0)` => `math.MaxUint64`.

* Fix `TestLoadConfigFile` test.
…ob`. (#13957)

* `SendDataColumnSidecarByRoot`: Return `RODataColumn` instead of `ROBlob`.

* Make deepsource happier.
* Upgrade c-kzg-4844 package

* Upgrade bazel deps
* Enable E2E And Add Fixes

* Register Same Topic For Data Columns

* Initialize Capacity Of Slice

* Fix Initialization of Data Column Receiver

* Remove Mix In From Merkle Proof

* E2E: Subscribe to all subnets.

* Remove Index Check

* Remaining Bug Fixes to Get It Working

* Change Evaluator to Allow Test to Finish

* Fix Build

* Add Data Column Verification

* Fix LoopVar Bug

* Do Not Allocate Memory

* Update beacon-chain/blockchain/process_block.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Update beacon-chain/core/peerdas/helpers.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Update beacon-chain/core/peerdas/helpers.go

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>

* Gofmt

* Fix It Again

* Fix Test Setup

* Fix Build

* Fix Trusted Setup panic

* Fix Trusted Setup panic

* Use New Test

---------

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Add Data Structure for New Request Type

* Add Data Column By Range Handler

* Add Data Column Request Methods

* Add new validation for columns by range requests

* Fix Build

* Allow Prysm Node To Fetch Data Columns

* Allow Prysm Node To Fetch Data Columns And Sync

* Bug Fixes For Interop

* GoFmt

* Use different var

* Manu's Review
* PeerDAS: Implement sampling.

* `TestNewRateLimiter`: Fix with the new number of expected registered topics.
* Set Custody Count Correctly

* Fix Discovery Count
* Adding error wrapping

* Fix `CustodyColumnSubnets` tests.
* Support Data Columns For By Root Requests

* Revert Config Changes

* Fix Panic

* Fix Process Block

* Fix Flags

* Lint

* Support Checkpoint Sync

* Manu's Review

* Add Support For Columns in Remaining Methods

* Unmarshal Uncorrectly
* Hack E2E

* Fix it For Real

* Gofmt

* Remove
* Wrap errors, add logs.

* `missingColumnRequest`: Fix blobs <-> data columns mix.

* `ColumnIndices`: Return `map[uint64]bool` instead of `[fieldparams.NumberOfColumns]bool`.

* `DataColumnSidecars`: `interfaces.SignedBeaconBlock` ==> `interfaces.ReadOnlySignedBeaconBlock`.

We don't need any of the non read-only methods.

* Fix comments.

* `handleUnblidedBlock` ==> `handleUnblindedBlock`.

* `SaveDataColumn`: Move log from debug to trace.

If we attempt to save an already existing data column sidecar,
a debug log was printed.

This case could be quite common now with the data column reconstruction enabled.

* `sampling_data_columns.go` --> `data_columns_sampling.go`.

* Reconstruct data columns.
* Remove some `_` identifiers.

* Blob storage: Implement a notifier system for data columns.

* `dataColumnSidecarByRootRPCHandler`: Remove ugly `time.Sleep(100 * time.Millisecond)`.

* Address Nishant's comment.
* Introduce hidden flag `data-columns-withhold-count`.

* Address Nishant's comment.
nisdas and others added 20 commits November 27, 2024 10:24
* Add Data Column Metrics

* Shift it All To Peerdas Package
* `pingPeers`: Add log with new ENR when modified.

* `p2p Start`: Use idiomatic go error syntax.

* P2P `start`: Fix error message.

* Use not bootnodes at all if the `--chain-config-file` flag is used and no `--bootstrap-node` flag is used.

Before this commit, if the  `--chain-config-file` flag is used and no `--bootstrap-node` flag is used, then bootnodes are (incorrectly) defaulted on `mainnet` ones.

* `validPeersExist`: Centralize logs.

* `AddConnectionHandler`: Improve logging.

"Peer connected" does not really reflect the fact that a new peer is actually connected. --> "New peer connection" is more clear.

Also, instead of writing `0`, `1`or `2` for direction, now it's writted "Unknown", "Inbound", "Outbound".

* Logging: Add 2 decimals for timestamt in text and JSON logs.

* Improve "no valid peers" logging.

* Improve "Some columns have no peers responsible for custody" logging.

* `pubsubSubscriptionRequestLimit`: Increase to be consistent with data columns.

* `sendPingRequest`: Improve logging.

* `FindPeersWithSubnet`: Regularly recheck in our current set of peers if we have enough peers for this topic.

Before this commit, new peers HAD to be found, even if current peers are eventually acceptable.
For very small network, it used to lead to infinite search.

* `subscribeDynamicWithSyncSubnets`: Use exactly the same subscription function initially and every slot.

* Make deepsource happier.

* Nishant's commend: Change peer disconnected log.

* NIshant's comment: Change `Too many incoming subscription` log from error to debug.

* `FindPeersWithSubnet`: Address Nishant's comment.

* `batchSize`: Address Nishant's comment.

* `pingPeers` ==> `pingPeersAndLogEnr`.

* Update beacon-chain/sync/subscriber.go

Co-authored-by: Nishant Das <nishdas93@gmail.com>

---------

Co-authored-by: Nishant Das <nishdas93@gmail.com>
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
* `sendBatchRootRequest`: Refactor and add comments.

* `sendBatchRootRequest`: Do send requests to peers that custodies a superset of our columns.

Before this commit, we sent "data columns by root requests" for data columns peers do not custody.

* Data columns: Use subnet sampling only.

(Instead of peer sampling.)

aaa

* `areDataColumnsAvailable`: Improve logs.

* `GetBeaconBlock`: Improve logs.

Rationale: A `begin` log should always be followed by a `success` log or a `failure` log.
* `validateDataColumn`: Refactor logging.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

* `isDataAvailable`: Improve logging.

* Add hidden debug flag: `--data-columns-reject-slot-multiple`.

* Add more logs about peer disconnection.

* `validPeersExist` --> `enoughPeersAreConnected`

* `beaconBlocksByRangeRPCHandler`: Add remote Peer ID in logs.

* Stop calling twice `writeErrorResponseToStream` in case of rate limit.
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
* `columnErrBuilder`: Uses `Wrap` instead of `Join`.

Reason: `Join` makes a carriage return. The log is quite unreadable.

* `validateDataColumn`: Improve log.

* `areDataColumnsAvailable`: Improve log.

* `SendDataColumnSidecarByRoot` ==> `SendDataColumnSidecarsByRootRequest`.

* `handleDA`: Refactor error message.

* `sendRecentBeaconBlocksRequest` ==> `sendBeaconBlocksRequest`.

Reason: There is no notion at all of "recent" in the function.

If the caller decides to call this function only with "recent" blocks, that's fine.
However, the function itself will know nothing about the "recentness" of these blocks.

* `sendBatchRootRequest`: Improve comments.

* `sendBeaconBlocksRequest`: Avoid `else` usage and use map of bool instead of `struct{}`.

* `wrapAndReportValidation`: Remove `agent` from log.

Reason: This prevent the log to hold on one line, and it is not really useful to debug.

* `validateAggregateAndProof`: Add comments.

* `GetValidCustodyPeers`: Fix typo.

* `GetValidCustodyPeers` ==> `DataColumnsAdmissibleCustodyPeers`.

* `CustodyHandler` ==> `DataColumnsHandler`.

* `CustodyCountFromRemotePeer` ==> `DataColumnsCustodyCountFromRemotePeer`.

* Implement `DataColumnsAdmissibleSubnetSamplingPeers`.

* Use `SubnetSamplingSize` instead of `CustodySubnetCount` where needed.

* Revert "`wrapAndReportValidation`: Remove `agent` from log."

This reverts commit 55db351.
* `retrieveMissingDataColumnsFromPeers`: Improve logging.

* `dataColumnSidecarByRootRPCHandler`: Stop decreasing peer's score if asking for a column we do not custody.

* `dataColumnSidecarByRootRPCHandler`: If a data column is unavailable, stop waiting for it.

This behaviour was useful for peer sampling.
Now, just return the data column if we store it.
If we don't, skip.

* Dirty code comment.

* `retrieveMissingDataColumnsFromPeers`: Improve logs.

* `SendDataColumnsByRangeRequest`: Improve logs.

* `dataColumnSidecarsByRangeRPCHandler`: Improve logs.
* `BestFinalized`: Refactor (no functional change).

* `BestNonFinalized`: Refactor (no functional change).

* `beaconBlocksByRangeRPCHandler`: Remove useless log.

The same is already printed at the start of the function.

* `calculateHeadAndTargetEpochs`: Avoid `else`.

* `ConvertPeerIDToNodeID`: Improve error.

* Stop printing noisy "peer should be banned" logs.

* Initial sync: Request data columns from peers which:
- custody a superset of columns we need, and
- have a head slot >= our target slot.

* `requestDataColumnsFromPeers`: Shuffle peers before requesting.

Before this commit, we always requests peers in the same order,
until one responds something.
Without shuffling, we always requests data columns from the same
peer.

* `requestDataColumnsFromPeers`: If error from a peer, just log the error and skip the peer.

* Improve logging.

* Fix tests.
* Improve logging.

* `retrieveMissingDataColumnsFromPeers`: Limit to `512` items per request.

* `retrieveMissingDataColumnsFromPeers`: Allow `nil` peers.

Before this commit:
If, when this funcion is called, we are not yet connected to enough peers, then `peers` will be possibly not be satisfaying,
and, if new peers are connected, we will never see them.

After this commit:
If `peers` is `nil`, then we regularly check for all connected peers.
If `peers` is not `nil`, then we use them.
* `retrieveMissingDataColumnsFromPeers`: Search only for needed peers.

* Improve logging.
* Fix Commitments Check

* `highestFinalizedEpoch`: Refactor (no functional change).

* `retrieveMissingDataColumnsFromPeers`: Fix logs.

* `VerifyDataColumnSidecarKZGProofs`: Optimise with capacity.

* Save data columns when initial syncing.

* `dataColumnSidecarsByRangeRPCHandler`: Add logs when a request enters.

* Improve logging.

* Improve logging.

* `peersWithDataColumns: Do not filter any more on peer head slot.

* Fix Nishant's comment.

---------

Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
* `ColumnAlignsWithBlock`: Split lines.

* Data columns verifications: Batch

* Remove completely `DataColumnBatchVerifier`.

Only `DataColumnsVerifier` (with `s`) on columns remains.
It is the responsability of the function which receive the data column
(either by gossip, by range request or by root request) to verify the
data column wrt. corresponding checks.

* Fix Nishant's comment.
@nalepae nalepae force-pushed the peerDAS-do-not-merge branch from 2dd4302 to 3432ffa Compare November 27, 2024 10:13
@nalepae nalepae marked this pull request as ready for review November 27, 2024 10:13
@nalepae nalepae requested a review from a team as a code owner November 27, 2024 10:13
@nalepae nalepae marked this pull request as draft November 27, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants