Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 022 - Custom baseapp panic handling #6072

Merged
merged 16 commits into from
May 13, 2020
Merged
1 change: 1 addition & 0 deletions docs/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,4 @@ Please add a entry below in your Pull Request for an ADR.
- [ADR 019: Protocol Buffer State Encoding](./adr-019-protobuf-state-encoding.md)
- [ADR 020: Protocol Buffer Transaction Encoding](./adr-020-protobuf-transaction-encoding.md)
- [ADR 021: Protocol Buffer Query Encoding](./adr-021-protobuf-query-encoding.md)
- [ADR 022: Custom baseapp panic handling](./adr-022-custom-panic-handling.md)
207 changes: 207 additions & 0 deletions docs/architecture/adr-022-custom-panic-handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# ADR 022: Custom BaseApp panic handling

## Changelog

- 2020 Apr 24: Initial Draft

## Context

Current implementation of BaseApp does not allow developers to write custom error handlers in
iTiky marked this conversation as resolved.
Show resolved Hide resolved
[runTx()](https://github.com/cosmos/cosmos-sdk/blob/bad4ca75f58b182f600396ca350ad844c18fc80b/baseapp/baseapp.go#L539)
method. We think that this method can be more flexible and can give SDK users more options for customizations without
the need to rewrite whole BaseApp. Also there's one special case for `sdk.ErrorOutOfGas` error handling, that case
might be handled in a "standard" way (middleware) alongside the others.

We propose middleware-solution, which could help developers implement the following cases:
* add external logging (let's say sending reports to external services like [Sentry](https://sentry.io));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand how this would work. Would every node make an HTTP call to Sentry?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are examples of how developers can use those middlewares. They can send panic logs to Sentry if they need to or do other struf needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how Sentry would be used in a replicated state machine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just examples of what developers of Cosmos SDK based project could use those handlers for. May be they want to get notifications on every panic() event (Sentry), that doesn't collide with state machine ideology.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that (in a production blockchain) the state machine is replicated, so many copies of every error would be sent to Sentry, and you would need to be careful to ensure that making HTTP requests doesn't introduce non-determinism. It's an alright example, but these points need to be clarified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Sentry can aggregate multiple similar event into one issue with some meta and statistics. As for non-deterministic behavior: developers can do that in so many ways including the proposed middlewares =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice. The determinism bit is more important. I think it would be sufficient if we add a warning that middlewares must be deterministic, or consensus safety will be lost.

* call panic for specific error cases;
cwgoes marked this conversation as resolved.
Show resolved Hide resolved

It will also make `OutOfGas` case and `default` case one of the middlewares.
iTiky marked this conversation as resolved.
Show resolved Hide resolved
`Default` case wraps recovery object to an error and logs it ([example middleware implementation](#Recovery-middleware)).

## Decision

### Design

#### Overview

Instead of hardcoding custom error handling into BaseApp we suggest using set of middlewares which can be customized
externally and will allow developers use as many custom error handlers as they want. Implementation with tests
can be found [here](https://github.com/cosmos/cosmos-sdk/pull/6053).

#### Implementation details

##### Recovery handler

We add a `recover()` object handler type:
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved

```go
type RecoveryHandler func(recoveryObj interface{}) error
```

`recoveryObj` is a return value of `recover()` function.
Handler should type assert (or other methods) an object to define if object should be handled.
`nil` should be returned if input object can't be handled by that `RecoveryHandler` (not a handler's target type).
Not `nil` error should be returned if input object was handled and middleware chain execution should be stopped.

An example:

```go
func exampleErrHandler(recoveryObj interface{}) error {
err, ok := recoveryObj.(error)
if !ok { return nil }

if someSpecificError.Is(err) {
panic(customPanicMsg)
} else {
return nil
}
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
}
```

This example breaks the application execution, but it also might enrich the error's context like the `OutOfGas` handler.

##### Recovery middleware

We also add a middleware type (decorator). That function type wraps `RecoveryHandler` and returns next middleware in
iTiky marked this conversation as resolved.
Show resolved Hide resolved
execution chain and handler's `error`. Type is used to separate actual `recovery()` object handling from middleware
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
chain processing.

```go
type recoveryMiddleware func(recoveryObj interface{}) (recoveryMiddleware, error)

func newRecoveryMiddleware(handler RecoveryHandler, next recoveryMiddleware) recoveryMiddleware {
return func(recoveryObj interface{}) (recoveryMiddleware, error) {
if err := handler(recoveryObj); err != nil {
return nil, err
}
return next, nil
}
}
```

Function receives a `recover()` object and returns:
* (next `recoveryMiddleware`, `nil`) if object wasn't handled (not a target type) by `RecoveryHandler`;
* (`nil`, not nil `error`) if input object was handled and other middlewares in the chain should not be executed;
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
* (`nil`, `nil`) this is an invalid behaviour 'cause in that case panic recovery might not be properly handled;
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
iTiky marked this conversation as resolved.
Show resolved Hide resolved
This can be avoided by always using a `default` as a rightmost middleware in chain (always returns an `error`');

`OutOfGas` middleware example:
```go
func newOutOfGasRecoveryMiddleware(gasWanted uint64, ctx sdk.Context, next recoveryMiddleware) recoveryMiddleware {
handler := func(recoveryObj interface{}) error {
err, ok := recoveryObj.(sdk.ErrorOutOfGas)
if !ok { return nil }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should middleware decorators simply pass recoveryObj along to the next middleware if it isn't an error they handle instead of returning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they do. The example above creates a handler and passes it to func newRecoveryMiddleware which creates a middleware. That way developer only needs to create a RecoveryHandler (which only return an error / nil). The rest is done with helper functions automatically.


return sdkerrors.Wrap(
sdkerrors.ErrOutOfGas, fmt.Sprintf(
"out of gas in location: %v; gasWanted: %d, gasUsed: %d", err.Descriptor, gasWanted, ctx.GasMeter().GasConsumed(),
),
)
}

return newRecoveryMiddleware(handler, next)
}
```

alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
`Default` middleware example:
```go
func newDefaultRecoveryMiddleware() recoveryMiddleware {
handler := func(recoveryObj interface{}) error {
return sdkerrors.Wrap(
sdkerrors.ErrPanic, fmt.Sprintf("recovered: %v\nstack:\n%v", recoveryObj, string(debug.Stack())),
)
}

return newRecoveryMiddleware(handler, nil)
}
```

##### Recovery processing

Basic chain of middlewares processing would look like:

```go
func processRecovery(recoveryObj interface{}, middleware recoveryMiddleware) error {
if middleware == nil { return nil }

next, err := middleware(recoveryObj)
if err != nil { return err }
if next == nil { return nil }

return processRecovery(recoveryObj, next)
}
```

That way we can create a middleware chain which is executed from left to right, the rightmost middleware is a
`default` handler which must return an `error`.

##### BaseApp changes

The default middleware chain must exist in a `BaseApp` object. `Baseapp` modifications:

```go
type BaseApp struct {
// ...
runTxRecoveryMiddleware recoveryMiddleware
}

func NewBaseApp(...) {
// ...
app.runTxRecoveryMiddleware = newDefaultRecoveryMiddleware()
}

func (app *BaseApp) runTx(...) {
// ...
defer func() {
if r := recover(); r != nil {
recoveryMW := newOutOfGasRecoveryMiddleware(gasWanted, ctx, app.runTxRecoveryMiddleware)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hard-coding newOutOfGasRecoveryMiddleware into BaseApp? I thought the idea was to defer to app.runTxRecoveryMiddleware?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because OutOfGas error handling needs some extra context and can't be statically created.

err, result = processRecovery(r, recoveryMW), nil
}

gInfo = sdk.GasInfo{GasWanted: gasWanted, GasUsed: ctx.GasMeter().GasConsumed()}
}()
// ...
}
```

Developers can add their custom `RecoveryHandler`s:
iTiky marked this conversation as resolved.
Show resolved Hide resolved

```go
func (app *BaseApp) AddRunTxRecoveryHandler(handlers ...RecoveryHandler) {
for _, h := range handlers {
app.runTxRecoveryMiddleware = newRecoveryMiddleware(h, app.runTxRecoveryMiddleware)
}
}
```

This method would prepend handlers to an existing chain.

## Status

Proposed

## Consequences

### Positive

- Developers of Cosmos SDK based projects can add custom panic handlers to:
* add error context for custom panic sources (panic inside of custom keepers);
* emit `panic()`: passthrough recovery object to the Tendermint core;
* other necessary handling;
- Developers can use standard Cosmos SDK `BaseApp` implementation, rather that rewriting it in their projects;
- Proposed solution doesn't break the current "standard" `runTx()` flow;

### Negative

- Introduces changes to the execution model design.

### Neutral

- `OutOfGas` error handler becomes one of the middlewares;
- Default panic handler becomes one of the middlewares;

## References
iTiky marked this conversation as resolved.
Show resolved Hide resolved

- [PR-6053 with proposed solution](https://github.com/cosmos/cosmos-sdk/pull/6053)
- [Similar solution. ADR-010 Modular AnteHandler](https://github.com/cosmos/cosmos-sdk/blob/v0.38.3/docs/architecture/adr-010-modular-antehandler.md)