Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICS33: Multi-hop Channel Spec #882

Merged
merged 52 commits into from
Sep 14, 2023
Merged

Conversation

notbdu
Copy link
Contributor

@notbdu notbdu commented Nov 11, 2022

Adds a spec for multi-hop channels. Viewable at https://github.com/polymerdao/ibc/tree/bo/ics-32/spec/core/ics-033-multi-hop.

Opening for discussion.

TODOs:

  • Update ICS 4 with multi-hop verifier logic.

@angbrav angbrav self-assigned this Dec 13, 2022
@notbdu notbdu changed the title ICS32: Multi-hop Channel Spec ICS33: Multi-hop Channel Spec Dec 16, 2022
@notbdu
Copy link
Contributor Author

notbdu commented Dec 17, 2022

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review.

The overall concept is exciting! Though I'm concerned about a few areas. First, I think the handshake checks need to be stricter. I've commented on what needs to happen there

Also, verifyMultiHopProof which is the crux of this proposal is not clearly speced. I think it needs to be defined as clearly as the generation functions so we can verify for correctness.

I did not look at the generation functions in detail for this round of review since the verification function is more important.

Handshake Validation (TRY and ACK):

  1. Prove that all intermediate connections are OPEN
  2. Prove that the connection hops from A->B are the inverse IDs of the connection hops from B->A

This will be possible by unmarshalling the intermediate connectionEnds stored in the MultiHopProof

Proof Verification:

  1. Prove that the client stored the provided connection state and consensus state of the first hop
  2. Iteratively move through the hops and prove that they stored the next connection state and consensus state against the root of the previously proven consensus state
  3. Prove that the source chain stored the desired key and value against the last proven consensus state

There is one issue with the proof verification that I'm not sure how to resolve as yet. At the moment, we don't have a way to prove that the intermediate consensus states are fresh. How can we be sure that we aren't using a consensus state that is forged using a stale validator set that is past its unbonding period during the intermediate steps?

Thinking as a first step could we verify the consensus state timestamp is not too far past our current time using the original clients trusting period? That might be an initial direction to consider

spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved

if (connectionHops.length > 1) {
consensusState = provableStore.get(consensusStatePath(connection.ClientId, proofHeight)
abortTransactionUnless(VerifyMultihopProof(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to pass in the key to the verification function here. If the proof is self-contained without passing in the key, this could be a valid multihop proof for some different key entirely.

Key and value must be passed in by caller to proof verification functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis should be a multihop proof, proving that the ChanOpenInit was stored on src chain

spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved

if (connectionHops.length > 1) {
consensusState = provableStore.get(consensusStatePath(connection.ClientId, proofHeight)
abortTransactionUnless(VerifyMultihopProof(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a similar check on ChanOpenAck.

Given that we will have all the connection ends in the MultiHopProof, we should check at least in these handshake messages

That the connection hops from A->B are the inverse of the connection hops from B->A

This is possible if we have the connection end structs between A and B

@dshiell
Copy link
Contributor

dshiell commented Dec 21, 2022

Thanks for the review @AdityaSripal! I pushed some updates that should address most of your initial comments, and we're discussing internally about your higher level concern regarding forged consensus states in intermediate hops. The verification logic should have more details for you to take a look at in the meanwhile. Cheers!

@notbdu
Copy link
Contributor Author

notbdu commented Dec 24, 2022

There is one issue with the proof verification that I'm not sure how to resolve as yet. At the moment, we don't have a way to prove that the intermediate consensus states are fresh. How can we be sure that we aren't using a consensus state that is forged using a stale validator set that is past its unbonding period during the intermediate steps?

I have a few thoughts:

  • Can think of the client consensus states along a channel path as having pointers to each other.
  • Updating the destination pointer updates the entire chain of pointers.
  • The above is the same in the single hop and multi hop cases.
  • The tendermint light client verification algo should reject "stale" headers produced by a valset past the trusting period (https://github.com/polymerdao/tendermint/blob/85a584dd2ee3437b117d9f5e894eac70bf09aeef/light/verifier.go#L56-L59)
    • Either the header would need to be created while valset is still within the trusting period (which means misbehaviour is slashable) or the client would essentially freeze after trusted valset is past trusting period as no progress can be made.
  • The last hop doesn't have a wrong view, only potentially an older but valid view.

@dshiell
Copy link
Contributor

dshiell commented Feb 7, 2023

Link to implementation branch: https://github.com/polymerdao/ibc-go/tree/ds/polymer-multihop

Can view one of the tests (mutli-hop packet) here: https://github.com/polymerdao/ibc-go/blob/ds/polymer-multihop/modules/core/04-channel/keeper/multihop_packet_test.go

Implementation branch now up to date with ibc-go v7 here: https://github.com/polymerdao/ibc-go/tree/polymer/multihop-v7

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think the only major change necessary is to add a time check for each consensus state so we can be sure they are fresh. This should also happen for the source consensus state as well since it is not happening automatically anymore since you aren't using the clientstate verify method and instead using the proof method directly.

Approving so it isn't blocked on a re-review

spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
3. Starting with known `ConsensusState[N-1]` at the given `proofHeight` on `ChainN` prove the prior chain's consensus and connection state.
4. Repeat step 3, proving `ConsensusState[i-1]` and `Conn[i-1,i]` until `ConsensusState0` on the source chain is proven.
- Start with i = N-1
- ConsensusState <= ConsensusState[i-1] on Chain[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have access to all the intermediate consensus states, we should definitely check that the timestamp in the consensus state is still within the trusting period of the final client. This way we can ensure we are using fresh state


// VerifyMultiHopConsensusStateProof verifies the consensus state of each consecutive consensus/connection state starting
// from chain[N-1] on the destination (chain[N]) and finally proving the source chain consensus/connection state.
func VerifyMultiHopConsensusStateProof(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name this better since it is proving connection state and consensus state at each hop

abortTransactionUnless(len(proofs.ConsensusProofs) >= 1)
abortTransactionUnless(len(proofs.ConnectionProofs) == len(proofs.ConsensusProofs))

abortTransactionUnless(VerifyMultiHopConsensusStateProof(consensusState, proofs.ConsensusProofs, proofs.ConnectionProofs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to understand this was correct.

There are two consensus states here.

consensusState is the consensus state stored on the destination chain to prove the first hop in the multiHop chain.

consState is the consensus state stored on the first chain in the multiHop link used to prove the final value.

These should be better named so its easier to understand the difference and where they are used

for i := len(consensusProofs) - 1; i >= 0; i-- {
consStateProof := consensusProofs[i]
connectionProof := connectionProofs[i]
conState := abortTransactionUnless(UnmarshalInterface(consStateProof.Value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down to line 334 for understandability

@AdityaSripal
Copy link
Member

Ignore the time check comment above, after discussion realized it wasn't necessary. VerifyMembership doesn't do a time check (outside of delayPeriod), only UpdateClient does

Comment on lines 149 to 292
previousChain := chains[i] // previous chains state root is on currentChain and is the source chain for i==0.
currentChain := chains[i+1] // currentChain is where the proof is queried and generated
nextChain := chains[i+2] // nextChain holds the state root of the currentChain

currentHeight := GetClientStateHeight(currentChain, sourceChain) // height of previous chain state on current chain
nextHeight := GetClientStateHeight(nextChain, currentChain) // height of current chain state on next chain

// consensus state of previous chain on current chain at currentHeight which is the height of A's client state on B
consensusState := GetConsensusState(currentChain, prevChain.ClientID, currentHeight)

// prefixed key for consensus state of previous chain
consensusKey := GetPrefixedConsensusStateKey(currentChain, currentHeight)

// proof of previous chain's consensus state at currentHeight on currentChain at nextHeight
consensusProof := GetConsensusStateProof(currentChain, nextHeight, currentHeight, currentChain.ClientID)

proofs = append(consStateProofs, &ProofData{
Key: &consensusKey,
Value: consensusState,
Proof: consensusProof,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started to look at this and this part is confusing to me. Not sue we can use currentHeight and nextHeight as specified. Maybe I am missing something but iiuc the design, the relayer needs to perform client updates on all chains and collect the proofs with the exact update heights as previous chain proof height.
In other words, in the diagram above should it be ConsensusState[i-1](@Height == H[i-1]? Otherwise the proofs will not verify (??)

Let's take the A--B--C--D example where a new packet is sent to A:

  • the relayer gets the packet event at height Ha
    • queries chain A for the commitment proof (proof height is Ha), let's call it A: KeyProof(commitment, Ha)
  • updates the client of A on B with the header at height Ha
    • the update is stored on chain B at Hb and consensus[A](@Ha) is stored on B
    • relayer can query chain B for proof of at height Hb and obtains: B: ConsensusProof(consensus[A](@Ha), Hb)
  • updates the client of B on C with the header at height Hb
    • the update is stored on chain C at Hc and consensus[B](@Hb) is stored on C
    • relayer can query chain C for proof of at height Hb and obtains: C: ConsensusProof(consensus[B](@Hb), Hc)

So the relayer has three proofs claiming:
A: KeyProof(commitment, Ha) - a packet with commitment exists on A
B: ConsensusProof(consensus[A](@Ha), Hb) - there is a consensus state on B (stored at Hb) for same height as the commitment proof height (Ha)
C: ConsensusProof(consensus[B](@Hb), Hc) - there is a consensus state on C (stored at Hc) for the same height as the previous chain consensus proof height (Hb)

At this point the relayer can build a Tx with MsgUpdateClient(header(@Hc) for the client of C on D +MsgRecvPacket with all the above proofs (plus the connection proofs below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed some changes above, currentHeight -> consensusHeight and nextHeight -> proofHeight

prefix = counterpartyConnectionEnd.GetCounterparty().GetPrefix()
abortTransactionUnless(VerifyMultihopProof(k.cdc, consensusState, channel.ConnectionHops, proof, prefix, key, commitment))
} else {
abortTransactionUnless(connection.verifyPacketData(
Copy link
Contributor

@ancazamfir ancazamfir Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at what we're missing for multi-hop by not calling this (not using the client security model):

  • clients on intermediate chains are not active
  • not checking delay for packet delay feature

Same checks are skipped in other packet handlers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for channel handlers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the active client is a great point. If an intermediate client gets frozen for misbehaviour, we won't detect it here. This could cause a proof to succeed against the forked consensus state. We should definitely be checking that all intermediate clients are active.

I mentioned the packet delay feature to the team. I think it makes sense to only allow packet_delay_time to be specified in this case, and for the intermediate consensus states to have their timestamps checked against the local clock of the final destination chain. Rather than checking against all the intermediate BFT clocks

Copy link
Contributor

@ancazamfir ancazamfir Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned the packet delay feature to the team. I think it makes sense to only allow packet_delay_time to be specified in this case, and for the intermediate consensus states to have their timestamps checked against the local clock of the final destination chain. Rather than checking against all the intermediate BFT clocks

Ok just to clarify, with multihop we only take into account the last connection delay and the pseudocode above will change to perform some version of verifyDelayPeriodPassed, in addition to the checks you mention.
So a relayer would send all client updates, collect the proofs and, if the last connection delay is non-zero, wait for the delay of the last hop before sending the packet.

Regarding the checks you mention, we need to allow for clock drift and block time variations. There is also an accumulated delay for the client update on the destination chain since for every hop the relayer needs to wait for the client update to be sent and consensus state to be stored on chain before we do the next chain update. So the timestamp of consensus state on chain i may be blockTime(i) older than the consensus state on chain i+1. This may be even higher if the update is not included in the "next" block.

Note: even with zero delay, packets will be effectively relayed with a delay of sum(blockTime(i), i = 1..N-1)

Not sure if all this is accurate, just some thoughts and you may have a different view or I may miss something obvious. To clarify any confusion, could we add more english explanations to the specs to make things clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok just to clarify, with multihop we only take into account the last connection delay and the pseudocode above will change to perform some version of verifyDelayPeriodPassed, in addition to the checks you mention.

Or do we delay the packet/ client updates with the sum of the delays of all previous connections? Each hop delay is available to the destination chain so it could verify that consensus state timestamps of adjacent chains respect the connection delay.

Copy link
Member

@AdityaSripal AdityaSripal Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that more English explanations in addition to the pseudocode would be very useful.

Ok just to clarify, with multihop we only take into account the last connection delay and the pseudocode above will change to perform some version of verifyDelayPeriodPassed, in addition to the checks you mention.
So a relayer would send all client updates, collect the proofs and, if the last connection delay is non-zero, wait for the delay of the last hop before sending the packet.

It would wait such that that the client updates of all intermediate hops have at least passed the delay period specified at the final hop. Yes, I think in practice this would mean waiting for last client update to pass the delay period most of the time. But the point is the delay period parameter specified on the conneciton end will be enforced against every intermediate consensus state using the BFT clock of the final proving chain.

i.e. it should fail if the last consensus state is passed the delayPeriod but the first consensus state is not. This can happen if the later hops clients were more recently updated by other relayers and only the first couple hops needed to have updates filled in to complete the route.

Regarding the checks you mention, we need to allow for clock drift and block time variations. There is also an accumulated delay for the client update on the destination chain since for every hop the relayer needs to wait for the client update to be sent and consensus state to be stored on chain before we do the next chain update. So the timestamp of consensus state on chain i may be blockTime(i) older than the consensus state on chain i+1. This may be even higher if the update is not included in the "next" block.

I don't believe this is as relevant. Perhaps in order to route a packet, a relayer may have to submit client updates on some intermediate hops. But if all the hops are up-to-date, a relayer can simply query proofs from each chain and submit them. So the minimum delay is negligible, and the maximum delay as you said is sum(blockTime(i))

The purpose of the block delay period is to ensure that there is some grace period for detecting misbehaviour before we use the consensus state for proof verification.

The clock drift is not cumulative across the path, instead the effective clock drift on this connection will be the maximum drift between any two chains on this route.

So supposing we wanted to allow for a grace period of time n for relayers to submit misbehaviour against a consensus state. We would want the delay period to be something like:

delayPeriodTime = n + sumBlock(i) + max_clock_drift

In practice, I think in order for n to be a useful grace period at all, n >> sumBlock(i) + maxClockDrift.

So rather than explicitly accounting for this in the protocol, I think if a user wants delayPeriod protection they should just set the value to be large enough to account for any of the above factors. This should be noted briefly in the spec

Copy link
Member

@AdityaSripal AdityaSripal Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh wait, I just realized the delayPeriod is implemented to be enforced against time of addition not the actual timestamp of the consensus state. So it is not easy to do this in the multihop case...

Will have to think about it some more

Especially since we do not include time of addition inside the consensus state itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen if the later hops clients were more recently updated by other relayers and only the first couple hops needed to have updates filled in to complete the route.

This statement and other comments in the spec suggest that I am missing something very basic. How can this be? If you update a client at chain i before you update the client at chain i-1 the recursive proof at chain i will fail, no? Since the commitment root stored is for an app state on i-1 that did not include the the expected commitment (consensus state for i-2).

I thought that the consensus state on chain i must be created after the one in i-1, otherwise the proof chain is broken. Is this not true? Also, the commitment proof on chain i-1 must be taken at the exact height as the height of the consensus state (that provides the commitment root) on chain i.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh wait, I just realized the delayPeriod is implemented to be enforced against time of addition not the actual timestamp of the consensus state. So it is not easy to do this in the multihop case...

Will have to think about it some more

Especially since we do not include time of addition inside the consensus state itself

Ok but if:

  • the destination chain N verifies the delay against the time it stored the consensus state for N-1, and
  • it is true that consensusState[i] must have been stored after consensusState[i-1] for proofs to verify (see my last questions in ICS33: Multi-hop Channel Spec #882 (comment))

then the delayPeriod enforcement is also proven

commitmenttypes.GetSDKSpecs(),
consensusState.GetRoot(),
*connectionProof.PrefixedKey,
connectionProof.Value, // this should be from connectionHops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me, do we check anywhere that the consensus states are from clients associated with the actual connections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we check each connection is OPEN and matching the connections specified in the channel connectionHops field here.

We also check that each of those provided connectionEnds are within the provided consensus states here

At a hight level we prove:

  • The connectionHops used to establish the channel match the connections provided in the multihop proof and are OPEN
  • the connectionEnds are proven within the provided consensus states
  • the consensus states are proven recursively starting from the trusted consensus state on the receiving chain.

Does that suffice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two clients may have the same consensus state. So I was looking for an explicit check that connectionEnd.ClientId is the same as the clientId for which the consensus proof was collected, i.e. same as the ID in its key path. Check would be something like:

consensusProofClientId = strings.Split(consensusProof.PrefixedKey.GetKeyPath()[len(consensusProof.PrefixedKey.KeyPath)-1], "/")..
abortTransactionUnless(connectionEnd.ClientId == consensusProofClientId)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we will likely need to bundle clientStates with the multihop proof as well per offline discussion. In this case we should definitely verify the clientid of the clientstate matches the consensus/connection state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ancazamfir also, FYI, I'm not sure if you saw this. Here is the offline doc summarizing the review comments: https://docs.google.com/document/d/1lFxGZwapWANirgCO3NthsDGHgmBdSo_71alj14rAhnU/edit#

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to drop comments there and we can enshrine the decision in GH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i will take a look and provide comments there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize the offline discussion and google doc collaboration:

  • delayPeriod will be enforced by computing the maximum delayPeriod across all connections in the channel path and enforcing that delayPeriod on the final consensusState on the verifier. This should suffice because by construction all recursively embedded consensus states must have existed for at least as long as the final consensus state.
  • Client states are NOT required to be bundled with the multi-hop proof. Instead a new chanCloseFrozen() method will be introduced to allow relayers to post a multi-hop proof of a frozen client state starting from the misbehaving chain in the channel path. The proof is not symmetric, but relayers can send a unique proof (representing the hops from each end to the misbehaving chain) to actively freeze the channel.
  • Frozen channels can not process packets and a new FROZEN state could be introduced to help distinguish this situation.
  • If a client is frozen in the channel path then it would not be possible to send newly committed packets as the consensus state associated with the frozen client would not be updatable making it impossible to create new multi-hop proofs.

Hopefully I represented that correctly and didn't miss anything.

Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass for some files. We need to cleanup a bit and fix some consistency issues. I also think we need more clarity on the channel freeze section, left a couple of questions there.

spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
abortTransactionUnless(height01 >= proofHeight) // ensure that chain0's client state is update to date

// query the key/value proof on the source chain at the proof height
keyProof, _ := chain0.QueryProofAtHeight([]byte(key), int64(proofHeight.GetRevisionHeight()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is QueryProofAtHeight defined in any ICS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is part of the ibctesting framework (chain.go) though I'm not sure if it is defined in any ICS. Do you suggest explicitly defining it in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I was curious is there is one. Maybe just declare the primitives in a prior section. This could also be simplified to pseudocode, smthg like chain0.QueryProofAtHeight(key, proofHeight) or QueryProofAtHeight(chain0, key, proofHeight)

spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-033-multi-hop/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
Comment on lines 241 to 270
function verifyMultihopMembership(
connection: ConnectionEnd,
height: Height,
proof: MultihopProof,
connectionHops: String[],
key: String,
value: ArrayBuffer) {
multihopConnectionEnd = abortTransactionUnless(getMultihopConnectionEnd(proof))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some comments here and clarify in particular what connection is vs the multihopConnectionEnd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

channel.state = CLOSED
provableStore.set(channelPath(portIdentifier, channelIdentifier), channel)
}
```

The `chanCloseFrozen` function is called by a relayer to force close a multi-hop channel if any client state in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a separate pseudocode on how the proofs are built? I think they will look differently depending to which channel end they are sent to. In the example A--B--xC--D (where x is the frozen client for B hosted on C), A cannot receive but could sent while D cannot sent but could receive.
I would expect then some logic below to determine if the client is for one of the connection hops or the counterparty of a connection hop. And basically find the C-B-A connection hops from the A--D channel connection hops when the close is received by A.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your example, the frozen client proof would look like this:

  1. Multi-hop consensus/connection proof from C-->A. This is the same as a regular multi-hop proof without the key/value proof.
  2. key/value proof on chain B. For a frozen client state we need to know the key of the frozen client state on chainB, but this is actually tricky since we don't know the client ID. This is why we need the multi-hop proof in step 1 to include the connection state on the misbehaving chain C. The connectionEnd counterparty contains the client id on chain B we can use to derive the key for the client state on chain B. The client state value is complex and is provided with the proof.

I realize we never talked about the implementation details for frozen client proofs, but do you see any issues with this approach? It is efficient in some sense because it allows re-using the same multi-hop proof struct with an additional index parameter for frozen clients and allows use of a single multi-hop proof. The index is used to generalize the multi-hop proof so that the key/value can be proven on any consensus state in the mutli-hop channel path (in this case the client state proven on chainB instead of chainC).

Although similar to standard multi-hop proving, it does have some distinct differences. I'll add pseudo code for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand 2. If we look at the chanCloseFrozen sent to A
So for the case A--B--xC--D: The connection proof from 1 should have the B-end of the connection B--C. Once verified, we can get the client ID on C (for the B--C connection) from the counterparty of the B-end. That should match the client for the key/value proof.
In contrast let's take the case A--Bx--C--D: In this case the key/value proof should be for the client of the B-end of the connection and not the counterparty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry, my example didn't follow your original case - I was assuming that chainC was the misbehaving chain with the frozen client on chainB.

Yes, the way you explained it is exactly how I was thinking. I'll push the pseudo code shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a pseudocode on how the proofs are constructed?

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like FrozenProofGeneration is not specified.

Mostly reviewed and left final suggestions on the state machine code.

Haven't reviewed the proof generation code in great detail yet.

Great work!

}

// Return the maximum delay period across all connections in the channel path.
function getMaximumDelayPeriod(proof: MultihopProof, lastConnection: ConnectionEnd): number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return two numbers. The delayPeriodBlock and delayPeriodTime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, the block delay is computed below, but I'll refactor this to return timeDelay and blockDelay

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this done?

))
"", counterpartyHops, counterpartyVersion}

if (connectionHops.length > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to abstract this away from the channel layer. Ideally ICS4, just calls the VerifyChannelState function and then the connection looks at the connectionHops and decides whether it needs to verify a MultiHopProof or a single proof

Think we'll want that for sure in the implementation anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, there are a few annoying challenges to doing this in the actual implementation, but I'll see if I can simplify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preferred way to do this might be to first standardize the calling interface for all the single-hop verification calls to collapse them to VerifyMembership/VerifyNonMembership and then add a switch statement in the implementation to handle multi-hop separate from single-hop, but this seems like non-trivial work.

spec/core/ics-003-connection-semantics/README.md Outdated Show resolved Hide resolved
height: Height,
proof: MultihopProof,
connectionHops: String[],
key: String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

abortTransactionUnless(connection.state === OPEN)

// connectionEnd on misbehaving chain representing the counterparty connection on the next chain
let counterpartyConnectionEnd = proofFrozen.ConnectionProofs[0].Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need ot check that this connectionEnd is indeed in our connectionHops for this channel

Copy link
Contributor

@dshiell dshiell Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it is checked in connection.verifyMultihopMembership which eventually checks the connection hops under the hood.

@@ -84,7 +84,7 @@ interface ChannelEnd {
- The `nextSequenceSend`, stored separately, tracks the sequence number for the next packet to be sent.
- The `nextSequenceRecv`, stored separately, tracks the sequence number for the next packet to be received.
- The `nextSequenceAck`, stored separately, tracks the sequence number for the next packet to be acknowledged.
- The `connectionHops` stores the list of connection identifiers, in order, along which packets sent on this channel will travel. At the moment this list must be of length 1. In the future multi-hop channels may be supported.
- The `connectionHops` stores the list of connection identifiers, in order, along which packets sent on this channel will travel. A list length is greater than 1 indicates a multi-hop channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectionHops appears in ChanOpenInit and ChanOpenTry. I think we should clarify that the order is starting from the receiving chain towards the other end, i.e. connectionHops[0] is the end on the chain receiving this message.

In this example:
A (a) --- (b1) B (b2) --- (c) C
for OpenInit we will have [a, b2]
for OpenTry we will have [c, b1]

See also this line that appears in ChanOpenInit and ChanOpenTry:
connection = provableStore.get(connectionPath(connectionHops[0]))

I would also suggest doing the same for proofs, i.e. reverting the order in the proof collection pseudocode (prepend instead of append) so they appear in the same order as the connectionHops.
At the moment in VerifyConnectionHops it seems that the connectionProofs are in the reverse order of the connectionHops.

Alternatively we can change the ChanOpenTry to have the connectionHops in this order in the example:
[c, b1]
and get the connection like this:
connection = provableStore.get(connectionPath(connectionHops[n-1]))
(lower preference for me)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, added some info on the connectionHops and updated the proof ordering per your suggestion.

connection,
proofHeight,
proofInit,
connectionHops,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? it's been computed above as:
counterpartyHops = getCounterPartyHops(proofInit)
and we pass proofInit above.
Same question for chanOpenAck

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean do we need the connectionHops argument if we already compute counterpartyHops? The connectionHops passed into ChanOpenTry is the connectionHops from Receiver --> Sender while the counterpartyHops are the computed hops from Sender --> Receiver used to create the expected channel for the proof.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, i misread the code, all good!

Comment on lines 166 to 251
// connectionEnd on misbehaving chain representing the counterparty connection on the next chain
counterpartyConnectionEnd := abortTransactionUnless(Unmarshal(proof.ConnectionProofs[0].Value))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first look this doesn't seem to work imho. Could we have a section going through an example? I think it would be useful as the proof generation in general is complex and for this case in particular.
So again let's take this example with connection IDs showing. Assume the client used by bc is frozen

A (a) --- (ba) B (bc) xx --- (cb) C (cd) --- (dc) D

So we call:

  1. GenerateFrozenClientProof([B, A]) -> proofs will be sent to A
  • In this case A will get connection proofs [bc, a] and will take the client of bc.
  1. GenerateFrozenClientProof([B, C, D]) -> proofs will be sent to D
  • D will get proofs for [cb, dc] but should use the client of the counterparty of cb.

So 1. and 2. don't look the same wrt to proof collection logic. Same for verification in chanCloseFrozen.
Intuitively I was expecting some kind of indication of the "direction" being froze, not sure we can generalize for both 1 and 2. But I might be proven wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your assessment is correct. We made a decision to only allow the proof to work on the side with the frozen client (proof logic to A in step 1 from your example). We came to the same conclusion that it would be hard to generalize though this is certainly open for consideration.

The consequence of this choice is that in your example above only the channel on chain A could be forcefully frozen. In order for the channel on chain D to be frozen dc would need to be frozen (although in this example it would not be necessary since chain D would already see the frozen client state).

So in order to freeze both ends of the channel, evidence must be submitted to freeze the client state on both sides of the misbehaving chain. This way the proof logic is always the same, and the tradeoff is that the channel can only be frozen on the side of the channel path containing the frozen client.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A clarification that in my example those IDs were for connections but yes, we have similar for clientIDs)

When we discussed this offline, I was asking the same question and feedback from @AdityaSripal was that we should close both ends. I agree with that, we should close both ends.
We cannot freeze a client on the other "side" as that client/ connection may be valid and used by other single or multi-hop channels.

I think we can make this function work if we add an explicit flag, e.g. i (incoming) to the last proof. By default proofs are o, i.e. outgoing.

(Note that I flipped the proofs below as well, i.e. they can be verified from first to last)
So for 2. the proof could be something like [dc, cb, bc(i)] vs for 1. [a, bc]
If the last proof is marked i then the client on the incoming/ counterparty connection end is frozen, i.e. bc must be for the counterparty of cb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question with this approach. If ClientStateC on ChainB (represented as bc) is frozen, would that be provable on ConsensusStateB on ChainC? Once the client is frozen there will not be any more consensus updates so would ConsensusStateB be frozen just prior to the freeze or would that state be included in the state?

That said, I went back and looked at the implementation for the ChanCloseFrozen proof, and I actually think the existing logic can support channel freezing on both ends of the channel with very small changes by using the new KeyProofIndex field of the MultihopProof struct to also indicate the directionality of the proof. Going back to your example above:

A (a) --- (ba) B (bc) xx --- (cb) C (cd) --- (dc) D

  1. For A, its multihopConnectionEnd is bc. The ClientID for ClientBC should be multihopConnectionEnd.Counterparty.ClientD and we can use this logic to determine the client state key if the KeyProofIndex != 0

  2. For D, its multihopConnectionEnd is cb. The ClientID for ClientBC should be multihopConnectionEnd.ClientID and we can use this to determine the key for the client state if KeyProofIndex == 0

KeyProofIndex is a new field introduced to keep the multi-hop verification logic the same while allowing verification of keys within intermediate consensus states in the proof chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question with this approach. If ClientStateC on ChainB (represented as bc) is frozen, would that be provable on ConsensusStateB on ChainC? Once the client is frozen there will not be any more consensus updates so would ConsensusStateB be frozen just prior to the freeze or would that state be included in the state?

Not sure I understand. If the client for bc is frozen at height hb we can still update the client for cb on C with a consensus state of chain B at a height hb', hb'>hb. Then we can query B's client for bc with proof at height hb'.

That said, I went back and looked at the implementation for the ChanCloseFrozen proof, and I actually think the existing logic can support channel freezing on both ends of the channel with very small changes by using the new KeyProofIndex field of the MultihopProof struct to also indicate the directionality of the proof. Going back to your example above:

A (a) --- (ba) B (bc) xx --- (cb) C (cd) --- (dc) D

  1. For A, its multihopConnectionEnd is bc. The ClientID for ClientBC should be multihopConnectionEnd.Counterparty.ClientD and we can use this logic to determine the client state key if the KeyProofIndex != 0

shouldn't be multihopConnectionEnd.ClientID in this case? and ``KeyProofIndex == 0`

  1. For D, its multihopConnectionEnd is cb. The ClientID for ClientBC should be multihopConnectionEnd.ClientID and we can use this to determine the key for the client state if KeyProofIndex == 0

shouldn't be multihopConnectionEnd.Counterparty.ClientD and KeyProofIndex != 0

KeyProofIndex is a new field introduced to keep the multi-hop verification logic the same while allowing verification of keys within intermediate consensus states in the proof chain.

So iiuc, KeyProofIndex is non-zero in case 2 of proofs for closing channels due to on-path client being frozen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both sides should be frozen using their respective clients. So in the example you have above, C is malicious.

So you use the frozen client (bc) to freeze channel on A. And as you alreaady mentioned, D will be automatically disabled when (dc) is frozen.

If C is malicious, both (bc) and (dc) can be frozen with the same misbehaviour.

Let's look at a more complicated example.

A (ab) --- (ba) B (bc) xx --- (cb) C (cd) --- (dc) D (de) --- (ed) E

Now suppose again that chain C has become malicious.

Here, I would say that a relayer should first freeze the clients of C on the direct counterparties. Namely, (bc) and (dc) should be frozen.

Then the channel should be closed on both ends using a MultiHop Proof

We need a proof for A that (bc) is frozen and a proof for E that (dc) is frozen.

I don't think it makes sense to prove the same client for both sides.

We want to close both sides, but for each end we should use the client that is relevant to that end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this when reviewing and got stuck exactly here...

Here, I would say that a relayer should first freeze the clients of C on the direct counterparties. Namely, (bc) and (dc) should be frozen.

hermes currently submits evidence only once a client is updated. If bc client is updated (by some other relayer) hermes compares the header used in the update with the header it gets from the C full node it connects to. Then it submits MsgMisbehaviour but to bc only.

The relayer could try to freeze all clients of C as you point out. But there is no guarantee that this works, in the example, the header that updated bc might not be verifiable on the dc client because the clients may have different security parameters (specifically the trusting level). So in theory we have a small probability channel on E will not be closed because we cannot freeze dc.

Can we live with this in practice? It would simplify the spec quite a bit, we could eliminate KeyProofIndex, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the later discussion, I will keep the logic as is. Channel freezing will require a relayer to freeze the client on each side of the channel and generate a separate frozen client proof for each channel end.

Comment on lines 306 to 316
5. Verify the intermediate state proofs. Starting with known `ConsensusState[N-1]` at the given `proofHeight` on `ChainN` prove the prior chain's consensus and connection state.
6. Verify that the client id in each consensus state proof key matches the client id in the ConnectionEnd in the previous connection state proof.
7. Repeat step 3, proving `ConsensusState[i-1]`, and `Conn[i-1,i]` until `ConsensusState[0]` on the source chain is proven.
- Start with i = N-1
- ConsensusState <- ConsensusProof[i-1].Value on Chain[i]
- ConnectionEnd <- ConnectionProof[i-1].Value on Chain[i]
- Verify ParseClientID(ConsensusProofs[i].Key) == ConnectionEnd.ClientID
- ConsensusProofs[i].Proof.VerifyMembership(ConsensusState.GetRoot(), ConsensusProofs[i].Key, ConsensusProofs[i].Value)
- ConnectionProofs[i].Proof.VerifyMembership(ConsensusState.GetRoot(), ConnectionProofs[i].Key, ConnectionProofs[i].Value)
- Set ConsensusState from unmarshalled ConsensusProofs[i].Value
8. Finally, prove the expected channel or packet commitment in `ConsensusState[0]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments should be updated to reflect that first proof is for the receiving chain and last for the source (or misbehaving) chain.

dshiell and others added 25 commits August 1, 2023 23:23
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: Aditya <adityasripal@gmail.com>
…op proof generation; use "proofHeight" instead of "keyHeight" for consistency
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
…o match connectionHops (receiver --> sender)
…the connectionHops ordering (receiver --> sender)
Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! Lot of thought, tedious work and patience got into this complex spec, really appreciate it!

@AdityaSripal
Copy link
Member

Excellent work to everyone involved!! 🎉 🎉 🎉

@AdityaSripal AdityaSripal merged commit 727ae4c into cosmos:main Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

6 participants