feat: observability #70

mfw78 · 2023-09-30T05:01:46Z

Problem

As the standalone watch-tower moves into production, observability is critical in ensuring the reliability of the service.

Alternatives considered

Use of loggly / sentry. These have been trialed, and while loggly is suitable for some log analysis, this can easily be achieved by ELK. Loggly as well doesn't provide much in terms of metrics monitoring - this is the bread and butter of Prometheus / Grafana.

Acceptance criteria

/health endpoint that returns not ready if the chain is still in warm-up.
/metrics endpoint with targeted metrics distributed throughout.

The text was updated successfully, but these errors were encountered:

# Description This PR addresses issues around naming conventions for the prometheus metrics. Best efforts have been followed to adhere to [best practices](https://prometheus.io/docs/practices/naming/). # Changes - [x] Revised metric naming and labeling to be in accordance with best practices. - [x] Removed custom API prometheus metrics and replaced with middleware. ## How to test 1. Run a sync from contract genesis. 2. Observe via `http://127.0.0.1:8080/metrics` there respective metrics changing. ## Related Issues Related #78, #70

# Description This PR provides overall health monitoring for the block watcher (chain contexts). This assists with Kubernetes deployment to enable health probes for automatic restarting and monitoring. # Changes - [x] All chain contexts added to a static mapping for use in other services (ie. API). - [x] Implemented `/health` API endpoint, returning >= 400 when the chain is not synced (in warm-up). ## How to test 1. Start syncing chain from scratch. 2. Observe the `/health` API endpoint returns `500` status error, with JSON showing current chain context status. 3. Once in sync, observe the `/health` API endpoint returns `200` status with JSON showing the current chain status. ## Related Issues Fixes #70

mfw78 added enhancement New feature or request E:1.2: Watch Tower Service https://github.com/cowprotocol/pm/issues/8 labels Sep 30, 2023

mfw78 self-assigned this Sep 30, 2023

This was referenced Oct 2, 2023

feat: observability (api, metrics, and startup / liveliness probes) #75

Merged

chore: metrics todo and naming #95

Merged

chore: health probe #96

Merged

This was linked to pull requests Oct 8, 2023

chore: health probe #96

Merged

chore: metrics todo and naming #95

Merged

mfw78 removed a link to a pull request Oct 9, 2023

chore: metrics todo and naming #95

Merged

2 tasks

mfw78 closed this as completed in #96 Oct 10, 2023

mfw78 removed the enhancement New feature or request label Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: observability #70

feat: observability #70

mfw78 commented Sep 30, 2023 •

edited

Loading

feat: observability #70

feat: observability #70

Comments

mfw78 commented Sep 30, 2023 • edited Loading

Problem

Suggested solution

Alternatives considered

Acceptance criteria

mfw78 commented Sep 30, 2023 •

edited

Loading