Skip to content

Commit

Permalink
refactor: simplify collected metrics (#21963)
Browse files Browse the repository at this point in the history
  • Loading branch information
julienrbrt authored Sep 28, 2024
1 parent dd9e582 commit 787ee69
Show file tree
Hide file tree
Showing 20 changed files with 59 additions and 158 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,10 @@ Every module contains its own CHANGELOG.md. Please refer to the module you are i
### Improvements

* (sims) [#21613](https://github.com/cosmos/cosmos-sdk/pull/21613) Add sims2 framework and factory methods for simpler message factories in modules
* (modules) [#21963](https://github.com/cosmos/cosmos-sdk/pull/21963) Duplicatable metrics are no more collected in modules. They were unecessary overhead.

### Bug Fixes

* (sims) [#21952](https://github.com/cosmos/cosmos-sdk/pull/21952) Use liveness matrix for validator sign status in sims
* (sims) [#21906](https://github.com/cosmos/cosmos-sdk/pull/21906) Skip sims test when running dry on validators
* (cli) [#21919](https://github.com/cosmos/cosmos-sdk/pull/21919) Query address-by-acc-num by account_id instead of id.
Expand Down
77 changes: 25 additions & 52 deletions docs/learn/advanced/09-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,18 @@ their application through the use of the `telemetry` package. To enable telemetr

The Cosmos SDK currently supports enabling in-memory and prometheus as telemetry sinks. In-memory sink is always attached (when the telemetry is enabled) with 10 second interval and 1 minute retention. This means that metrics will be aggregated over 10 seconds, and metrics will be kept alive for 1 minute.

To query active metrics (see retention note above) you have to enable API server (`api.enabled = true` in the app.toml). Single API endpoint is exposed: `http://localhost:1317/metrics?format={text|prometheus}`, the default being `text`.
To query active metrics (see retention note above) you have to enable API server (`api.enabled = true` in the app.toml). Single API endpoint is exposed: `http://localhost:1317/metrics?format={text|prometheus}` (or port `1318` in v2) , the default being `text`.

## Emitting metrics

If telemetry is enabled via configuration, a single global metrics collector is registered via the
[go-metrics](https://github.com/hashicorp/go-metrics) library. This allows emitting and collecting
metrics through simple [API](https://github.com/cosmos/cosmos-sdk/blob/v0.50.0-alpha.0/telemetry/wrapper.go). Example:
metrics through simple [API](https://github.com/cosmos/cosmos-sdk/blob/v0.50.10/telemetry/wrapper.go). Example:

```go
func EndBlocker(ctx sdk.Context, k keeper.Keeper) {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

// ...
}
Expand Down Expand Up @@ -69,60 +70,32 @@ Consider the following examples with enough granularity and adequate cardinality
* begin/end blocker time
* tx gas used
* block gas used
* amount of tokens minted
* amount of accounts created

The following examples expose too much cardinality and may not even prove to be useful:

* transfers between accounts with amount
* voting/deposit amount from unique addresses

## Idempotency

Metrics aren't idempotent, so if a metric is emitted twice, it will be counted twice.
This is important to keep in mind when collecting metrics. If a module is called twice, the metrics will be emitted twice (for instance in `CheckTx`, `SimulateTx` or `DeliverTx`).

## Supported Metrics

| Metric | Description | Unit | Type |
|:--------------------------------|:------------------------------------------------------------------------------------------|:----------------|:--------|
| `tx_count` | Total number of txs processed via `DeliverTx` | tx | counter |
| `tx_successful` | Total number of successful txs processed via `DeliverTx` | tx | counter |
| `tx_failed` | Total number of failed txs processed via `DeliverTx` | tx | counter |
| `tx_gas_used` | The total amount of gas used by a tx | gas | gauge |
| `tx_gas_wanted` | The total amount of gas requested by a tx | gas | gauge |
| `tx_msg_send` | The total amount of tokens sent in a `MsgSend` (per denom) | token | gauge |
| `tx_msg_withdraw_reward` | The total amount of tokens withdrawn in a `MsgWithdrawDelegatorReward` (per denom) | token | gauge |
| `tx_msg_withdraw_commission` | The total amount of tokens withdrawn in a `MsgWithdrawValidatorCommission` (per denom) | token | gauge |
| `tx_msg_delegate` | The total amount of tokens delegated in a `MsgDelegate` | token | gauge |
| `tx_msg_begin_unbonding` | The total amount of tokens undelegated in a `MsgUndelegate` | token | gauge |
| `tx_msg_begin_begin_redelegate` | The total amount of tokens redelegated in a `MsgBeginRedelegate` | token | gauge |
| `tx_msg_ibc_transfer` | The total amount of tokens transferred via IBC in a `MsgTransfer` (source or sink chain) | token | gauge |
| `ibc_transfer_packet_receive` | The total amount of tokens received in a `FungibleTokenPacketData` (source or sink chain) | token | gauge |
| `new_account` | Total number of new accounts created | account | counter |
| `gov_proposal` | Total number of governance proposals | proposal | counter |
| `gov_vote` | Total number of governance votes for a proposal | vote | counter |
| `gov_deposit` | Total number of governance deposits for a proposal | deposit | counter |
| `staking_delegate` | Total number of delegations | delegation | counter |
| `staking_undelegate` | Total number of undelegations | undelegation | counter |
| `staking_redelegate` | Total number of redelegations | redelegation | counter |
| `ibc_transfer_send` | Total number of IBC transfers sent from a chain (source or sink) | transfer | counter |
| `ibc_transfer_receive` | Total number of IBC transfers received to a chain (source or sink) | transfer | counter |
| `ibc_client_create` | Total number of clients created | create | counter |
| `ibc_client_update` | Total number of client updates | update | counter |
| `ibc_client_upgrade` | Total number of client upgrades | upgrade | counter |
| `ibc_client_misbehaviour` | Total number of client misbehaviours | misbehaviour | counter |
| `ibc_connection_open-init` | Total number of connection `OpenInit` handshakes | handshake | counter |
| `ibc_connection_open-try` | Total number of connection `OpenTry` handshakes | handshake | counter |
| `ibc_connection_open-ack` | Total number of connection `OpenAck` handshakes | handshake | counter |
| `ibc_connection_open-confirm` | Total number of connection `OpenConfirm` handshakes | handshake | counter |
| `ibc_channel_open-init` | Total number of channel `OpenInit` handshakes | handshake | counter |
| `ibc_channel_open-try` | Total number of channel `OpenTry` handshakes | handshake | counter |
| `ibc_channel_open-ack` | Total number of channel `OpenAck` handshakes | handshake | counter |
| `ibc_channel_open-confirm` | Total number of channel `OpenConfirm` handshakes | handshake | counter |
| `ibc_channel_close-init` | Total number of channel `CloseInit` handshakes | handshake | counter |
| `ibc_channel_close-confirm` | Total number of channel `CloseConfirm` handshakes | handshake | counter |
| `tx_msg_ibc_recv_packet` | Total number of IBC packets received | packet | counter |
| `tx_msg_ibc_acknowledge_packet` | Total number of IBC packets acknowledged | acknowledgement | counter |
| `ibc_timeout_packet` | Total number of IBC timeout packets | timeout | counter |
| `store_iavl_get` | Duration of an IAVL `Store#Get` call | ms | summary |
| `store_iavl_set` | Duration of an IAVL `Store#Set` call | ms | summary |
| `store_iavl_has` | Duration of an IAVL `Store#Has` call | ms | summary |
| `store_iavl_delete` | Duration of an IAVL `Store#Delete` call | ms | summary |
| `store_iavl_commit` | Duration of an IAVL `Store#Commit` call | ms | summary |
| `store_iavl_query` | Duration of an IAVL `Store#Query` call | ms | summary |
| Metric | Description | Unit | Type |
| ------------------- | ------------------------------------------------------------------------------ | ---- | ------- |
| `tx_count` | Total number of txs processed via `DeliverTx` | tx | counter |
| `tx_successful` | Total number of successful txs processed via `DeliverTx` | tx | counter |
| `tx_failed` | Total number of failed txs processed via `DeliverTx` | tx | counter |
| `tx_gas_used` | The total amount of gas used by a tx | gas | gauge |
| `tx_gas_wanted` | The total amount of gas requested by a tx | gas | gauge |
| `store_iavl_get` | Duration of an IAVL `Store#Get` call | ms | summary |
| `store_iavl_set` | Duration of an IAVL `Store#Set` call | ms | summary |
| `store_iavl_has` | Duration of an IAVL `Store#Has` call | ms | summary |
| `store_iavl_delete` | Duration of an IAVL `Store#Delete` call | ms | summary |
| `store_iavl_commit` | Duration of an IAVL `Store#Commit` call | ms | summary |
| `store_iavl_query` | Duration of an IAVL `Store#Query` call | ms | summary |
| `begin_blocker` | Duration of the `BeginBlock` call per module | ms | summary |
| `end_blocker` | Duration of the `EndBlock` call per module | ms | summary |
| `server_info` | Information about the server, such as version, commit, and build date, upgrade | - | gauge |
8 changes: 3 additions & 5 deletions telemetry/wrapper.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,9 @@ import (

// Common metric key constants
const (
MetricKeyBeginBlocker = "begin_blocker"
MetricKeyEndBlocker = "end_blocker"
MetricKeyPrepareCheckStater = "prepare_check_stater"
MetricKeyPrecommiter = "precommiter"
MetricLabelNameModule = "module"
MetricKeyBeginBlocker = "begin_blocker"
MetricKeyEndBlocker = "end_blocker"
MetricLabelNameModule = "module"
)

// NewLabel creates a new instance of Label with name and value
Expand Down
2 changes: 1 addition & 1 deletion x/bank/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ require (
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.4
github.com/grpc-ecosystem/grpc-gateway v1.16.0
github.com/hashicorp/go-metrics v0.5.3
github.com/hashicorp/go-metrics v0.5.3 // indirect
github.com/spf13/cobra v1.8.1
github.com/stretchr/testify v1.9.0
google.golang.org/genproto/googleapis/api v0.0.0-20240814211410-ddb44dafa142
Expand Down
15 changes: 0 additions & 15 deletions x/bank/keeper/msg_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,9 @@ package keeper
import (
"context"

"github.com/hashicorp/go-metrics"

errorsmod "cosmossdk.io/errors"
"cosmossdk.io/x/bank/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -65,18 +62,6 @@ func (k msgServer) Send(ctx context.Context, msg *types.MsgSend) (*types.MsgSend
return nil, err
}

defer func() {
for _, a := range msg.Amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "send"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgSendResponse{}, nil
}

Expand Down
14 changes: 0 additions & 14 deletions x/bank/v2/keeper/handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,12 @@ import (
"errors"
"fmt"

"github.com/hashicorp/go-metrics"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

errorsmod "cosmossdk.io/errors"
"cosmossdk.io/x/bank/v2/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -85,18 +83,6 @@ func (h handlers) MsgSend(ctx context.Context, msg *types.MsgSend) (*types.MsgSe
return nil, err
}

defer func() {
for _, a := range msg.Amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "send"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgSendResponse{}, nil
}

Expand Down
2 changes: 1 addition & 1 deletion x/distribution/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ require (
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.4
github.com/grpc-ecosystem/grpc-gateway v1.16.0
github.com/hashicorp/go-metrics v0.5.3
github.com/pkg/errors v0.9.1
github.com/spf13/cobra v1.8.1
github.com/stretchr/testify v1.9.0
Expand Down Expand Up @@ -92,6 +91,7 @@ require (
github.com/gsterjov/go-libsecret v0.0.0-20161001094733-a6f4afe4910c // indirect
github.com/hashicorp/go-hclog v1.6.3 // indirect
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
github.com/hashicorp/go-metrics v0.5.3 // indirect
github.com/hashicorp/go-plugin v1.6.1 // indirect
github.com/hashicorp/golang-lru v1.0.2 // indirect
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
Expand Down
3 changes: 2 additions & 1 deletion x/distribution/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import (
// BeginBlocker sets the proposer for determining distribution during endblock
// and distribute rewards for the previous block.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// determine the total power signing the block
var previousTotalPower int64
Expand Down
27 changes: 0 additions & 27 deletions x/distribution/keeper/msg_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,9 @@ import (
"context"
"fmt"

"github.com/hashicorp/go-metrics"

"cosmossdk.io/errors"
"cosmossdk.io/x/distribution/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -61,18 +58,6 @@ func (k msgServer) WithdrawDelegatorReward(ctx context.Context, msg *types.MsgWi
return nil, err
}

defer func() {
for _, a := range amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "withdraw_reward"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgWithdrawDelegatorRewardResponse{Amount: amount}, nil
}

Expand All @@ -87,18 +72,6 @@ func (k msgServer) WithdrawValidatorCommission(ctx context.Context, msg *types.M
return nil, err
}

defer func() {
for _, a := range amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "withdraw_commission"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgWithdrawValidatorCommissionResponse{Amount: amount}, nil
}

Expand Down
3 changes: 2 additions & 1 deletion x/epochs/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import (

// BeginBlocker of epochs module.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

headerInfo := k.HeaderService.HeaderInfo(ctx)
err := k.EpochInfo.Walk(
Expand Down
3 changes: 2 additions & 1 deletion x/evidence/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ import (
// BeginBlocker iterates through and handles any newly discovered evidence of
// misbehavior submitted by CometBFT. Currently, only equivocation is handled.
func (k Keeper) BeginBlocker(ctx context.Context, cometService comet.Service) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

bi := cometService.CometInfo(ctx)

Expand Down
3 changes: 2 additions & 1 deletion x/gov/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ import (

// EndBlocker is called every block.
func (k Keeper) EndBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

// delete dead proposals from store and returns theirs deposits.
// A proposal is dead when it's inactive and didn't get enough deposit on time to get into voting phase.
Expand Down
7 changes: 7 additions & 0 deletions x/group/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,18 @@ package keeper

import (
"context"

"cosmossdk.io/x/gov/types"

"github.com/cosmos/cosmos-sdk/telemetry"
)

// EndBlocker called at every block, updates proposal's `FinalTallyResult` and
// prunes expired proposals.
func (k Keeper) EndBlocker(ctx context.Context) error {
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

if err := k.TallyProposalsAtVPEnd(ctx); err != nil {
return err
}
Expand Down
3 changes: 2 additions & 1 deletion x/mint/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ import (

// BeginBlocker mints new tokens for the previous block.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// fetch stored minter & params
minter, err := k.Minter.Get(ctx)
Expand Down
Loading

0 comments on commit 787ee69

Please sign in to comment.