Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custody by root request to sync #6275

Closed

Conversation

dapplion
Copy link
Collaborator

@dapplion dapplion commented Aug 16, 2024

Issue Addressed

Part of

Build on / depends on

Proposed Changes

Adds a new higher order request to fetch all custody columns of a given block root.

It's a stateful request that may trigger one or more data_columns_by_root internally with features:

  • per-column retry
  • peer load balancing / prioritization of peer
  • don't re-request peers that return failures for some time

@dapplion dapplion added ready-for-review The code is ready for review das Data Availability Sampling das-unstable-merge labels Aug 16, 2024
@mergify mergify bot deleted the branch sigp:unstable August 21, 2024 00:12
@mergify mergify bot closed this Aug 21, 2024
@jimmygchen
Copy link
Member

This was automatically closed by mergify as the upstream branch was merged. I'll reopen this PR against unstable.

@jimmygchen jimmygchen reopened this Aug 21, 2024
@jimmygchen jimmygchen changed the base branch from peerdas-request-data-columns-by-root to unstable August 21, 2024 02:38
@jimmygchen jimmygchen changed the base branch from unstable to peerdas-request-data-columns-by-root August 21, 2024 02:40
@jimmygchen jimmygchen changed the base branch from peerdas-request-data-columns-by-root to unstable August 21, 2024 02:41
@jimmygchen
Copy link
Member

The refactoring on these PRs are making merging back to das really difficult, can we please avoid refactoring on these PRs, and instead do it against das branch?

Some of the new function signatures require reworking parts of the existing code on das branch, so it would be easier if the refactor happens directly on das branch, otherwise we'd have to do the rework as part of the merge back to das which is really painful.

@jimmygchen jimmygchen added do-not-merge and removed ready-for-review The code is ready for review labels Aug 21, 2024
@dapplion dapplion added work-in-progress PR is a work-in-progress and removed do-not-merge labels Aug 21, 2024
@dapplion dapplion added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Aug 21, 2024
DataColumnSubnetId::from_column_index::<E>(column_index as usize, spec),
))
.cloned()
.collect::<Vec<_>>()
Copy link
Member

@jimmygchen jimmygchen Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take the latest change from dasbranch and just use good_custody_subnet_peer?
This undos most the changes from #6218 - i understand selecting peers from entire peers is not the right solution, but even this doesn't fix it and just revert back to the previous version.

Copy link
Member

@jimmygchen jimmygchen Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happy if you'd like to change how it works on the das branch but i think it's better to target rework on the das branch before we finish the merge to unstable.

peer.insert_subnet(Subnet::DataColumn(subnet.into()));
}

peer_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all this change required? Why not just take the latest change from das branch? The change in #6218 fixes the custody lookup tests.


const FAILED_PEERS_CACHE_EXPIRY_SECONDS: u64 = 5;

type DataColumnSidecarVec<E> = Vec<Arc<DataColumnSidecar<E>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency:

Suggested change
type DataColumnSidecarVec<E> = Vec<Arc<DataColumnSidecar<E>>>;
type DataColumnSidecarList<E> = Vec<Arc<DataColumnSidecar<E>>>;

Also updated on the das branch custody.rs


pub struct ActiveCustodyRequest<T: BeaconChainTypes> {
block_root: Hash256,
requester: SingleLookupReqId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to change this from CustodyId? (on das branch)

indices: Vec<ColumnIndex>,
}

type CustodyRequestResult<E> = Result<Option<DataColumnSidecarVec<E>>, Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dumping the peer group?

das branch:

type CustodyRequestResult<E> = Result<Option<(DataColumnSidecarList<E>, PeerGroup)>, Error>;

column_request.on_download_success(
req_id,
peer_id,
// Safe to cast, self.column_requests only contains indexes for columns we must custody
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be removed as we're no longer casting, although I'm not sure why we removed the CustodyDataColumn type

.network_globals
.custody_peers_for_column(i, &self.harness.spec);
println!("peers column {} {:?}", i, peers);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for debugging? it doesn't actually assert anything but just prints

@@ -965,15 +966,31 @@ impl<T: BeaconChainTypes> SyncManager<T> {
fn on_data_columns_by_root_response(
&mut self,
req_id: DataColumnsByRootRequestId,
_requester: SingleLookupReqId,
requester: SingleLookupReqId,
Copy link
Member

@jimmygchen jimmygchen Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter type isn't compatible with the das branch, and if we want to change this type i think we should change it there instead, as it would affect other part of the code (e.g. sampling) and make merging back and conga line difficult

// TODO(das): this is not the correct peer to attribute to a response. It's
// just the last peer of multiple potential requests. We should switch to a
// concept of "peer group" to attribute fault to the correct peer.
peer_id,
Copy link
Member

@jimmygchen jimmygchen Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this on das branch, is there a reason why you don't want it included in this PR? (this would diverge with das branch)

@jimmygchen jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Aug 21, 2024
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, another 900+ lines down 🎉
I've went through the code and I don't see anything impacting mainnet code path, so we should be good to go once comments are addressed and CI passes.

@jimmygchen
Copy link
Member

@dapplion Is there any refactor in this PR that you'd like to merge to unstable? or shall we close this one?

@dapplion
Copy link
Collaborator Author

dapplion commented Aug 28, 2024

I ported a change to

else nothing relevant so I'll close

@dapplion dapplion closed this Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling waiting-on-author The reviewer has suggested changes and awaits thier implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants