Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Monotonicity Enforcement #141

Merged
merged 34 commits into from
May 10, 2021
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
993f5f2
implement update client fix and start changing tests
AdityaSripal Mar 19, 2021
1ef071c
fix bug and write identical case test
AdityaSripal Mar 19, 2021
d432451
write misbehaviour detection tests
AdityaSripal Mar 19, 2021
593e5a5
add misbehaviour events to UpdateClient
AdityaSripal Mar 19, 2021
38f6115
fix client keeper and write tests
AdityaSripal Mar 19, 2021
4a56b57
add Freeze to ClientState interface
AdityaSripal Mar 25, 2021
77f4c2f
Merge branch 'main' into aditya/update-client-fix
colin-axner Mar 26, 2021
84adabc
Merge branch 'aditya/update-client-fix' of github.com:cosmos/ibc-go-g…
AdityaSripal Mar 26, 2021
a29d312
add cache context and fix events
AdityaSripal Mar 26, 2021
b6a4635
Update modules/light-clients/07-tendermint/types/update.go
colin-axner Mar 30, 2021
9edfaf6
address colin comments
AdityaSripal Mar 30, 2021
dedd6d5
Merge branch 'aditya/update-client-fix' of github.com:cosmos/ibc-go-g…
AdityaSripal Mar 30, 2021
7172807
freeze entire client on misbehaviour
AdityaSripal Mar 30, 2021
b3e50b1
add time misbehaviour and tests
AdityaSripal Mar 30, 2021
117d7d1
Merge branch 'main' into alderfly-ibc-fix
AdityaSripal Mar 30, 2021
937364d
enforce trusted height less than current height in header.ValidateBasic
AdityaSripal Mar 31, 2021
3ec116d
Merge branch 'alderfly-ibc-fix' of github.com:cosmos/ibc-go-ghsa-fw94…
AdityaSripal Mar 31, 2021
5c793ae
cleanup and tests
AdityaSripal Mar 31, 2021
906a413
fix print statement
AdityaSripal Apr 26, 2021
f62cca7
fix merge
AdityaSripal Apr 26, 2021
155e461
enforce monotonicity in update
AdityaSripal Apr 26, 2021
326e5fb
add docs and remove unnecessary interface function
AdityaSripal Apr 26, 2021
2eaa2c9
first round of review comments
AdityaSripal Apr 28, 2021
f63ae67
CHANGELOG
AdityaSripal Apr 28, 2021
c3ac1eb
update updateclient test
AdityaSripal Apr 29, 2021
75bf94a
bump tendermint to 0.34.10
colin-axner Apr 30, 2021
e2a70fb
Merge branch 'main' into alderfly-ibc-fix
colin-axner Apr 30, 2021
eab22f0
remove caching and specific frozen height
AdityaSripal Apr 30, 2021
979435a
Merge branch 'alderfly-ibc-fix' of github.com:cosmos/ibc-go into alde…
AdityaSripal Apr 30, 2021
c7f8bd2
document in go code
AdityaSripal Apr 30, 2021
76e932a
DRY FrozenHeight
AdityaSripal May 3, 2021
cbbb715
fix merge conflicts
colin-axner May 5, 2021
2f90b44
fix build
colin-axner May 5, 2021
b288be9
fix minor merge conflicts
colin-axner May 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,17 @@ Ref: https://keepachangelog.com/en/1.0.0/
### State Machine Breaking

* (modules/light-clients/07-tendermint) [\#99](https://github.com/cosmos/ibc-go/pull/99) Enforce maximum chain-id length for tendermint client.
* (modules/light-clients/07-tendermint) [\#141](https://github.com/cosmos/ibc-go/pull/141) Allow a new form of misbehaviour that proves counterparty chain breaks time monotonicity, automatically enforce monotonicity in UpdateClient and freeze client if monotonicity is broken.
* (modules/light-clients/07-tendermint) [\#141](https://github.com/cosmos/ibc-go/pull/141) Freeze the client if there's a conflicting header submitted for an existing consensus state.
* (modules/core/02-client) [\#8405](https://github.com/cosmos/cosmos-sdk/pull/8405) Refactor IBC client update governance proposals to use a substitute client to update a frozen or expired client.
* (modules/core/02-client) [\#8673](https://github.com/cosmos/cosmos-sdk/pull/8673) IBC upgrade logic moved to 02-client and an IBC UpgradeProposal is added.

### Improvements

* (modules/core/04-channel) [\#7949](https://github.com/cosmos/cosmos-sdk/issues/7949) Standardized channel `Acknowledgement` moved to its own file. Codec registration redundancy removed.
* (modules/light-clients/07-tendermint) [\#125](https://github.com/cosmos/ibc-go/pull/125) Implement efficient iteration of consensus states and pruning of earliest expired consensus state on UpdateClient.
* (modules/light-clients/07-tendermint) [\#141](https://github.com/cosmos/ibc-go/pull/141) Return early in case there's a duplicate update call to save Gas.


## IBC in the Cosmos SDK Repository

Expand Down
106 changes: 72 additions & 34 deletions modules/core/02-client/keeper/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,63 +61,97 @@ func (k Keeper) UpdateClient(ctx sdk.Context, clientID string, header exported.H
return sdkerrors.Wrapf(types.ErrClientNotFound, "cannot update client with ID %s", clientID)
}

// prevent update if the client is frozen before or at header height
if clientState.IsFrozen() && clientState.GetFrozenHeight().LTE(header.GetHeight()) {
// prevent update if the client is frozen
if clientState.IsFrozen() {
return sdkerrors.Wrapf(types.ErrClientFrozen, "cannot update client with ID %s", clientID)
}

clientState, consensusState, err := clientState.CheckHeaderAndUpdateState(ctx, k.cdc, k.ClientStore(ctx, clientID), header)
eventType := types.EventTypeUpdateClient

// Cache the context to ensure that any writes in CheckHeaderAndUpdateState are only written if
// the update is successful and the update is not evidence of misbehaviour.
cacheCtx, writeFn := ctx.CacheContext()
AdityaSripal marked this conversation as resolved.
Show resolved Hide resolved
newClientState, newConsensusState, err := clientState.CheckHeaderAndUpdateState(cacheCtx, k.cdc, k.ClientStore(ctx, clientID), header)
if err != nil {
return sdkerrors.Wrapf(err, "cannot update client with ID %s", clientID)
}

k.SetClientState(ctx, clientID, clientState)

var consensusHeight exported.Height

// we don't set consensus state for localhost client
if header != nil && clientID != exported.Localhost {
k.SetClientConsensusState(ctx, clientID, header.GetHeight(), consensusState)
consensusHeight = header.GetHeight()
} else {
consensusHeight = types.GetSelfHeight(ctx)
}

k.Logger(ctx).Info("client state updated", "client-id", clientID, "height", consensusHeight.String())

defer func() {
telemetry.IncrCounterWithLabels(
[]string{"ibc", "client", "update"},
1,
[]metrics.Label{
telemetry.NewLabel("client-type", clientState.ClientType()),
telemetry.NewLabel("client-id", clientID),
telemetry.NewLabel("update-type", "msg"),
},
)
}()

// emit the full header in events
var headerStr string
var (
headerStr string
consensusHeight exported.Height
)
if header != nil {
// Marshal the Header as an Any and encode the resulting bytes to hex.
// This prevents the event value from containing invalid UTF-8 characters
// which may cause data to be lost when JSON encoding/decoding.
headerStr = hex.EncodeToString(types.MustMarshalHeader(k.cdc, header))
// set default consensus height with header height
consensusHeight = header.GetHeight()

}

// set new client state regardless of if update is valid update or misbehaviour
k.SetClientState(ctx, clientID, newClientState)
// If client state is not frozen after clientState CheckHeaderAndUpdateState,
// then update was valid. Write the update state changes, and set new consensus state.
// Else the update was proof of misbehaviour and we must emit appropriate misbehaviour events.
if !newClientState.IsFrozen() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was too much confusion regarding separation of responsibilities for detecting misbehaviour here. Because conflicting header can be detected here, but time monotonicity can't. Thus, it makes more sense to just make it the responsibility of client developers to do this correctly so we have clear separation of responsibility.

Here i just check if new clientstate is frozen and if so emit appropriate events/write state

I think resulting code is cleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly

Solo machines store the consensus state in the client state. Thus the only protection cached context adds is against metadata, but it still seems confusing to me. As client developers should be very aware not to write unwanted state changes for an update which is actually evidence of misbehaviour. What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?

We should either be as defensive as possible by assuming client developers miss checks or we should be as explicit as possible in saying it is entirely the responsibility of the app developer. If we cache the context, then I think we might as well do the duplicate consensus state check (and return an error if a duplicate update is successful)

I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior

Regardless, these requirements should be clearly documented in a light_client.md under docs/. These are subtle checks that are essential for security

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly

They are responsible for telling core IBC if an update was misbehaviour. They are not responsible for rolling back all state changes.

What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?

This is definitely possible, i guess its up to us what we want to enable. The downside is accidentally leaving in metadata writes after misbehaviour that we intend to write only for valid updates. Think @cwgoes can weigh in on tradeoff between flexibility and opinionated code. I believe being opinionated here and having the ClientKeeper write metadata on valid update makes more sense. Light client implementations are fully responsible for doing update logic (UpdateClient will do none of that).
But ClientKeeper will take the returned output, and do all of the necessary store writes. I think that's a clean separation of responsibility.

I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior

I tried doing this and the code got super ugly because there was freezing logic in both the ClientKeeper and in tendermint's update function. That could have been much cleaner if it was just in one place.

Furthermore, I think it's possible to take all client developer checks that must be done by every light client and put them in the ClientKeeper to minimize the possibility of light-client developer error.
But I think in practice, this would make things less secure if it trades off too much on separation of concerns.
Critically, I think it just needs to be clear to a reviewer/developer where a particular check is supposed to happen.
My proposal is that we create a very clear separation of concern that acts as a contract between core IBC and light client developer.

Light client implementation must give core IBC the updated clientstate/consensus state. And it must return a frozen client state if the update was evidence of misbehaviour.
Core IBC will in turn store the clientstate (and consensus if valid update), write all callback state changes on successful updates, and emit appropriate events.

This means that there may be redundant checks happening in light clients, that may be missed by some of them. But it gives a very clear rule for what a light client implementation is responsible for. Even though i place responsibility of all misbehaviour checks on light client. As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.

Without this, I need to be checking whether ClientKeeper+misbehaviour together are catching all misbehaviour. And I need to make sure together they don't miss a gap between them. And that they aren't doing redundant checks. It's also harder as time goes on to determine where a check should go. We would need to make a subjective decision on whether we think some check is universal or not.

For these reasons I think clear separation of concerns is more important than putting all universal checks (even subtle ones) in the ClientKeeper. But yes, this should absolutely be documented in light_client.md. Will do so once there's consensus on this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.

Developers shouldn't be forced to revert state themselves on error

Copy link
Contributor

@colin-axner colin-axner Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you make great points.

it must return a frozen client state if the update was evidence of misbehaviour.

I agree with this.

As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.

I like this, and I think we can still achieve this with a duplicate check. My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check. This is the code I have in mind:

consState, exists := keeper.GetConsensusState()

newClientState, newConsensusState, err := CheckHeaderAndUpdateState()
if err != nil {
    return err
}

// write client state, errors returned later revert state changes

switch {
case: newCilentState.IsFrozen()
    // use logic you have
case: exists && !reflect.DeepEqual(consState, newConsensusState)
    // light client implementation missed misbehaviour handling
    return err
default:
    // regular update code
}

I don't see why this code gets ugly? It allows light clients to fully implement misbehaviour logic without relying on 02-client and it allows 02-client to prevent duplicate updates which are misbehaviour

Copy link
Contributor

@colin-axner colin-axner Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.

Developers shouldn't be forced to revert state themselves on error

Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make

Oh yes you're correct about this. We should only cache if we discard metadata on misbehaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check.

Here's a question - is this always true? Certainly it is a problem if unequal consensus states at the same height would allow for violation of exactly-once packet delivery guarantees or timeouts, but there could conceivably be client types which allow duplicate consensus states, just not verification at them (so they are only intermediate update points) - for example, a (non-Tendermint) consensus algorithm could have a block history which looks like this:

blocks

Is this a case we want to consider? There is something to be said for not overly constraining what it means for clients to be "correct", since clients implement all of the packet data / timeout / etc. verification functions anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I didn't realize intermediate update points were a possibility.

In light of our discussion yesterday, I don't see the usefulness of adding this check if in the near future, light client implementations will be responsible for getting/setting client/consensus states. In this design, light clients should definitely be aware to guard against duplicate updates which constitute misbehaviour

// write any cached state changes from CheckHeaderAndUpdateState
// to store metadata in client store for new consensus state.
writeFn()

// if update is not misbehaviour then update the consensus state
// we don't set consensus state for localhost client
if header != nil && clientID != exported.Localhost {
k.SetClientConsensusState(ctx, clientID, header.GetHeight(), newConsensusState)
} else {
consensusHeight = types.GetSelfHeight(ctx)
}

k.Logger(ctx).Info("client state updated", "client-id", clientID, "height", consensusHeight.String())

defer func() {
telemetry.IncrCounterWithLabels(
[]string{"ibc", "client", "update"},
1,
[]metrics.Label{
telemetry.NewLabel("client-type", clientState.ClientType()),
telemetry.NewLabel("client-id", clientID),
telemetry.NewLabel("update-type", "msg"),
},
)
}()

} else {
// set eventType to SubmitMisbehaviour
eventType = types.EventTypeSubmitMisbehaviour

k.Logger(ctx).Info("client frozen due to misbehaviour", "client-id", clientID, "height", header.GetHeight().String())

defer func() {
telemetry.IncrCounterWithLabels(
[]string{"ibc", "client", "misbehaviour"},
1,
[]metrics.Label{
telemetry.NewLabel("client-type", clientState.ClientType()),
telemetry.NewLabel("client-id", clientID),
telemetry.NewLabel("msg-type", "update"),
},
)
}()
}

// emitting events in the keeper emits for both begin block and handler client updates
ctx.EventManager().EmitEvent(
sdk.NewEvent(
types.EventTypeUpdateClient,
eventType,
sdk.NewAttribute(types.AttributeKeyClientID, clientID),
sdk.NewAttribute(types.AttributeKeyClientType, clientState.ClientType()),
sdk.NewAttribute(types.AttributeKeyConsensusHeight, consensusHeight.String()),
sdk.NewAttribute(types.AttributeKeyHeader, headerStr),
),
)

return nil
}

Expand Down Expand Up @@ -178,8 +212,12 @@ func (k Keeper) CheckMisbehaviourAndUpdateState(ctx sdk.Context, misbehaviour ex
return sdkerrors.Wrapf(types.ErrClientNotFound, "cannot check misbehaviour for client with ID %s", misbehaviour.GetClientID())
}

if clientState.IsFrozen() && clientState.GetFrozenHeight().LTE(misbehaviour.GetHeight()) {
return sdkerrors.Wrapf(types.ErrInvalidMisbehaviour, "client is already frozen at height ≤ misbehaviour height (%s ≤ %s)", clientState.GetFrozenHeight(), misbehaviour.GetHeight())
if clientState.IsFrozen() {
return sdkerrors.Wrap(types.ErrInvalidMisbehaviour, "client is already frozen")
}

if err := misbehaviour.ValidateBasic(); err != nil {
colin-axner marked this conversation as resolved.
Show resolved Hide resolved
return err
}

clientState, err := clientState.CheckMisbehaviourAndUpdateState(ctx, k.cdc, k.ClientStore(ctx, misbehaviour.GetClientID()), misbehaviour)
Expand Down
Loading