Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: observability #70

Closed
2 tasks done
mfw78 opened this issue Sep 30, 2023 · 0 comments · Fixed by #96
Closed
2 tasks done

feat: observability #70

mfw78 opened this issue Sep 30, 2023 · 0 comments · Fixed by #96
Assignees
Labels
E:1.2: Watch Tower Service https://github.com/cowprotocol/pm/issues/8

Comments

@mfw78
Copy link
Contributor

mfw78 commented Sep 30, 2023

Problem

As the standalone watch-tower moves into production, observability is critical in ensuring the reliability of the service.

Suggested solution

Introduce an express API that provides health & readiness monitoring, in addition to a metrics collection endpoint for automatic prometheus scraping.

Alternatives considered

Use of loggly / sentry. These have been trialed, and while loggly is suitable for some log analysis, this can easily be achieved by ELK. Loggly as well doesn't provide much in terms of metrics monitoring - this is the bread and butter of Prometheus / Grafana.

Acceptance criteria

  • /health endpoint that returns not ready if the chain is still in warm-up.
  • /metrics endpoint with targeted metrics distributed throughout.
@mfw78 mfw78 added enhancement New feature or request E:1.2: Watch Tower Service https://github.com/cowprotocol/pm/issues/8 labels Sep 30, 2023
@mfw78 mfw78 self-assigned this Sep 30, 2023
This was linked to pull requests Oct 8, 2023
@mfw78 mfw78 removed a link to a pull request Oct 9, 2023
2 tasks
mfw78 added a commit that referenced this issue Oct 10, 2023
# Description
This PR addresses issues around naming conventions for the prometheus
metrics. Best efforts have been followed to adhere to [best
practices](https://prometheus.io/docs/practices/naming/).

# Changes

- [x] Revised metric naming and labeling to be in accordance with best
practices.
- [x] Removed custom API prometheus metrics and replaced with
middleware.

## How to test

1. Run a sync from contract genesis.
2. Observe via `http://127.0.0.1:8080/metrics` there respective metrics
changing.

## Related Issues

Related #78, #70
@mfw78 mfw78 closed this as completed in #96 Oct 10, 2023
mfw78 added a commit that referenced this issue Oct 10, 2023
# Description
This PR provides overall health monitoring for the block watcher (chain
contexts). This assists with Kubernetes deployment to enable health
probes for automatic restarting and monitoring.

# Changes

- [x] All chain contexts added to a static mapping for use in other
services (ie. API).
- [x] Implemented `/health` API endpoint, returning >= 400 when the
chain is not synced (in warm-up).

## How to test

1. Start syncing chain from scratch.
2. Observe the `/health` API endpoint returns `500` status error, with
JSON showing current chain context status.
3. Once in sync, observe the `/health` API endpoint returns `200` status
with JSON showing the current chain status.

## Related Issues

Fixes #70
@mfw78 mfw78 removed the enhancement New feature or request label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:1.2: Watch Tower Service https://github.com/cowprotocol/pm/issues/8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant