Classify & format panic logs correctly #521

CharlieC3 · 2024-03-10T14:05:35Z

It seems panic logs are not being printed in correct JSON syntax, which may cause issues when our log collectors try to parse them. Additionally, panic logs may not be getting classified correctly as fatal level (a rank above error level). See here for various supported log levels we can use.

{thread 'Chainhook event observer' panicked at 'Unable to register new chainhook spec: Network unknown', "components/chainhook-sdk/src/observer/mod.rs:msg1337"::25"

Ideally all panics would be printed in valid JSON, and be classified as fatal (or something similar to the critical log level referenced in the link above).

Referenced panic, but a fix may be needed for all panics.

The text was updated successfully, but these errors were encountered:

aravindgee · 2024-03-11T17:44:59Z

[Fixing this issue will also unblock chainhook alerting for thread panic.]

This PR introduces a few fixes in an effort to improve reliability and debugging problems when running Chainhook as a service: - Revisits log levels throughout the tool (fixes #498, fixes #521). The general approach for the logs were: - `crit` - fatal errors that will crash mission critical component of Chainhook. In these cases, Chainhook should automatically kill all main threads (not individual scanning threads, which is tracked by #404) to crash the service. - `erro` - something went wrong the could lead to a critical error, or that could impact all users - `warn` - something went wrong that could impact an end user (usually due to user error) - `info` - control flow logging and updates on the state of _all_ registered predicates - `debug` - updates on the state of _a_ predicate - Crash the service if a mission critical thread fails (see #517 (comment) for a list of these threads). Previously, if one of these threads failed, the remaining services would keep running. For example, if the event observer handler crashed, the event observer API would keep running. This means that the stacks node is successfully emitting blocks that Chainhook is acknowledging but not ingesting. This causes gaps in our database Fixes #517 - Removes an infinite loop with bitcoin ingestion, crashing the service instead: Fixes #506 - Fixes how we delete predicates from our db when one is deregistered. This should reduce the number of logs we have on startup. Fixes #510 - Warns on all reorgs. Fixes #519

## [1.4.0](v1.3.1...v1.4.0) (2024-03-27) ### Features * detect http / rpc errors as early as possible ([ad78669](ad78669)) * use stacks.rocksdb for predicate scan ([#514](#514)) ([a4f1663](a4f1663)), closes [#513](#513) [#485](#485) ### Bug Fixes * enable debug logs in release mode ([#537](#537)) ([fb49e28](fb49e28)) * improve error handling, and more! ([#524](#524)) ([86b5c78](86b5c78)), closes [#498](#498) [#521](#521) [#404](#404) [/github.com//issues/517#issuecomment-1992135101](https://github.com/hirosystems//github.com/hirosystems/chainhook/issues/517/issues/issuecomment-1992135101) [#517](#517) [#506](#506) [#510](#510) [#519](#519) * log errors on block download failure; implement max retries ([#503](#503)) ([0fc38cb](0fc38cb)) * **metrics:** update latest ingested block on reorg ([#515](#515)) ([8f728f7](8f728f7)) * order and filter blocks used to seed forking block pool ([#534](#534)) ([a11bc1c](a11bc1c)) * seed forking handler with unconfirmed blocks to improve startup stability ([#505](#505)) ([485394e](485394e)), closes [#487](#487) * skip db consolidation if no new dataset was downloaded ([#513](#513)) ([983a165](983a165)) * update scan status for non-triggering predicates ([#511](#511)) ([9073f42](9073f42)), closes [#498](#498)

github-actions · 2024-03-27T21:24:58Z

🎉 This issue has been resolved in version 1.4.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

github-project-automation bot added this to DevTools Mar 10, 2024

github-project-automation bot moved this to 🆕 New in DevTools Mar 10, 2024

MicaiahReid moved this from 🆕 New to 📋 Backlog in DevTools Mar 10, 2024

MicaiahReid self-assigned this Mar 10, 2024

smcclellan added this to the Production Reliability milestone Mar 12, 2024

MicaiahReid moved this from 📋 Backlog to 🏗 In Progress in DevTools Mar 12, 2024

MicaiahReid mentioned this issue Mar 15, 2024

fix: improve error handling, and more! #524

Merged

MicaiahReid moved this from 🏗 In Progress to 👀 In Review in DevTools Mar 18, 2024

MicaiahReid closed this as completed in d6b8816 Mar 27, 2024

MicaiahReid closed this as completed in #524 Mar 27, 2024

github-project-automation bot moved this from 👀 In Review to ✅ Done in DevTools Mar 27, 2024

github-actions bot added the released label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify & format panic logs correctly #521

Classify & format panic logs correctly #521

CharlieC3 commented Mar 10, 2024

aravindgee commented Mar 11, 2024

github-actions bot commented Mar 27, 2024

Classify & format panic logs correctly #521

Classify & format panic logs correctly #521

Comments

CharlieC3 commented Mar 10, 2024

aravindgee commented Mar 11, 2024

github-actions bot commented Mar 27, 2024