First of all you will have to install exporters on validator node. For that you can use one-liner below
wget -O install_exporters.sh https://raw.githubusercontent.com/kj89/cosmos_node_monitoring/master/install_exporters.sh && chmod +x install_exporters.sh && ./install_exporters.sh
KEY | VALUE |
---|---|
bond_denom | Denominated token name, for example, ubld for Agoric. You can find it in genesis file |
bench_prefix | Prefix for chain addresses, for example, agoric for Agoric. You can find it in public addresses like this agoricvaloper1zyyz4m9ytdf60fn9yaafx7uy7h463n7alv2ete |
rpc_port | Your validator rpc port that is defined in config.toml file. Default value for aura is 26657 |
grpc_port | Your validator grpc port that is defined in app.toml file. Default value for aura is 9090 |
make sure prometheus is enabled in validator config.toml
file
make sure following ports are open:
9100
(node-exporter)9300
(cosmos-exporter)26660
(validator prometheus)
Monitoring stack needs to be deployed on seperate machine to be able to notify in case if validator goes down! To run monitoring stack you dont need beastly server with multiple cores. It will be more than enough to run it on smallest available vps
Ubuntu 20.04 / 1 VCPU / 2 GB RAM / 20 GB SSD
To install monitirng stack you can use one-liner below
wget -O install_monitoring.sh https://raw.githubusercontent.com/kj89/cosmos_node_monitoring/master/install_monitoring.sh && chmod +x install_monitoring.sh && ./install_monitoring.sh
cp $HOME/cosmos_node_monitoring/config/.env.example $HOME/cosmos_node_monitoring/config/.env
vim $HOME/cosmos_node_monitoring/config/.env
KEY | VALUE |
---|---|
TELEGRAM_ADMIN | Your user id you can get from @userinfobot. The bot will only reply to messages sent from the user. All other messages are dropped and logged on the bot's console |
TELEGRAM_TOKEN | Your telegram bot access token you can get from @botfather. To generate new token just follow a few simple steps described here |
echo "export $(xargs < $HOME/cosmos_node_monitoring/config/.env)" > $HOME/.bash_profile
source $HOME/.bash_profile
To add validator use command with specified VALIDATOR_IP
, PROM_PORT
, VALOPER_ADDRESS
, WALLET_ADDRESS
and PROJECT_NAME
$HOME/cosmos_node_monitoring/add_validator.sh VALIDATOR_IP PROM_PORT VALOPER_ADDRESS WALLET_ADDRESS PROJECT_NAME
example:
$HOME/cosmos_node_monitoring/add_validator.sh 1.2.3.4 26660 cosmosvaloper1s9rtstp8amx9vgsekhf3rk4rdr7qvg8dlxuy8v cosmos1s9rtstp8amx9vgsekhf3rk4rdr7qvg8d6jg3tl cosmos
To add more validators just run command above with validator values
Deploy the monitoring stack
cd $HOME/cosmos_node_monitoring && docker-compose up -d
ports used:
8080
(alertmanager-bot)9090
(prometheus)9093
(alertmanager)9999
(grafana)
- Open Grafana in your web browser. It should be available on port
9999
-
Login using defaults
admin/admin
and change password -
Import custom dashboard
3.1. Press "+" icon on the left panel and then choose "Import"
3.2. Input grafana.com dashboard id 15991
and press "Load"
3.3. Select Prometheus data source and press "Import"
- Change your chain explorer url
4.1. Edit "Top validators missing blocks panel"
4.2. Go to "Overrides" and edit "Data links"
4.3 Change url to your chain data explorer and hit "Save"
4.4. Hit "Save" button on the left top corner to save changes to dashboard
- Congratulations you have successfully configured Cosmos Validator Dashboard
- Open conversation with your Telegram bot you created with @botfather and type
/start
to activate bot
- Now you are all set! If you want see other commands type
/help
If you want learn more about
alermanager-bot
please visit their github repo
- For simple test you can stop
node-exporter
service for 5 minutes. It should trigger alert
systemctl stop node_exporter
- You will see message from bot firing
- Now you can start
node-exporter
service back
systemctl start node_exporter
- You will get confirmation from bot that issue is resolved
Grafana dashboard is devided into 4 sections:
- Validator health - main stats for validator health. connected peers and missed blocks
- Chain health - summary of chain health stats and list of top validators missing blocks
- Validator stats - information about validator such as rank, bounded tokens, comission, delegations and rewards
- Hardware health - system hardware metrics. cpu, ram, network usage
cd $HOME/cosmos_node_monitoring
docker-compose down
docker volume prune -f
Resources I used in this project:
- Grafana Validator stats Cosmos Validator by freak12techno
- Grafana Hardware health AgoricTools by Chainode
- Stack of monitoring tools, docker configuration node_tooling by Xiphiar
- Alertmanager telegram bot alertmanager-bot by metalmatze