You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the standalone watch-tower moves into production, observability is critical in ensuring the reliability of the service.
Suggested solution
Introduce an express API that provides health & readiness monitoring, in addition to a metrics collection endpoint for automatic prometheus scraping.
Alternatives considered
Use of loggly / sentry. These have been trialed, and while loggly is suitable for some log analysis, this can easily be achieved by ELK. Loggly as well doesn't provide much in terms of metrics monitoring - this is the bread and butter of Prometheus / Grafana.
Acceptance criteria
/health endpoint that returns not ready if the chain is still in warm-up.
/metrics endpoint with targeted metrics distributed throughout.
The text was updated successfully, but these errors were encountered:
# Description
This PR addresses issues around naming conventions for the prometheus
metrics. Best efforts have been followed to adhere to [best
practices](https://prometheus.io/docs/practices/naming/).
# Changes
- [x] Revised metric naming and labeling to be in accordance with best
practices.
- [x] Removed custom API prometheus metrics and replaced with
middleware.
## How to test
1. Run a sync from contract genesis.
2. Observe via `http://127.0.0.1:8080/metrics` there respective metrics
changing.
## Related Issues
Related #78, #70
# Description
This PR provides overall health monitoring for the block watcher (chain
contexts). This assists with Kubernetes deployment to enable health
probes for automatic restarting and monitoring.
# Changes
- [x] All chain contexts added to a static mapping for use in other
services (ie. API).
- [x] Implemented `/health` API endpoint, returning >= 400 when the
chain is not synced (in warm-up).
## How to test
1. Start syncing chain from scratch.
2. Observe the `/health` API endpoint returns `500` status error, with
JSON showing current chain context status.
3. Once in sync, observe the `/health` API endpoint returns `200` status
with JSON showing the current chain status.
## Related Issues
Fixes#70
Problem
As the standalone watch-tower moves into production, observability is critical in ensuring the reliability of the service.
Suggested solution
Introduce an express API that provides health & readiness monitoring, in addition to a metrics collection endpoint for automatic prometheus scraping.
Alternatives considered
Use of loggly / sentry. These have been trialed, and while loggly is suitable for some log analysis, this can easily be achieved by ELK. Loggly as well doesn't provide much in terms of metrics monitoring - this is the bread and butter of Prometheus / Grafana.
Acceptance criteria
/health
endpoint that returns not ready if the chain is still in warm-up./metrics
endpoint with targeted metrics distributed throughout.The text was updated successfully, but these errors were encountered: