Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Retries on Transient Roomlog DB Connection Errors #10776

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

DieterReinert
Copy link
Contributor

Problem:
Previously, any unexpected DB connection termination resulted in lost logs, since we only retried on 42P01 (table not found). Other transient failures (e.g., “Connection terminated unexpectedly”) were crashlogged without a retry, causing data loss.

Solution:
Introduce a bounded retry mechanism (3 attempts) specifically for transient “Connection terminated unexpectedly” errors. After a brief wait, the query is re-run. If retries are exhausted, or if ignoreFailure is true, the function skips retries and relies on crashlogs as before.

Benefits:

  • Preserves Logs: Prevents loss of log entries due to DB connection drops by retrying the insert.
  • Consistent Recovery Logic: Extends existing table recreation and retry logic to handle additional transient errors.
  • Respecting ignoreFailure: Honors the ignoreFailure flag to bypass retries when log insertion isn't critical.
  • Minimal Performance Impact: Implements a bounded retry with short delays, avoiding excessive resource usage.
  • Enhanced Debugging: Continues to crashlog genuine, non-transient errors, ensuring visibility into persistent issues.

**Problem**:
Previously, any unexpected DB connection termination resulted in lost logs, since we only retried on `42P01` (table not found). Other transient failures (e.g., “Connection terminated unexpectedly”) were crashlogged without a retry, causing data loss.

**Solution**:
Introduce a bounded retry mechanism (3 attempts) specifically for transient “Connection terminated unexpectedly” errors. After a brief wait, the query is re-run. If retries are exhausted, or if `ignoreFailure` is `true`, the function skips retries and relies on crashlogs as before.

**Benefits**:
- **Preserves Logs**: Prevents loss of log entries due to ephemeral DB connection drops by retrying the insert.
- **Consistent Recovery Logic**: Extends existing table recreation and retry logic to handle additional transient errors.
- **Respecting `ignoreFailure`**: Honors the `ignoreFailure` flag to bypass retries when log insertion isn't critical.
- **Minimal Performance Impact**: Implements a bounded retry with short delays, avoiding excessive resource usage.
- **Enhanced Debugging**: Continues to crashlog genuine, non-transient errors, ensuring visibility into persistent issues.
@mia-pi-git mia-pi-git merged commit d0152f5 into smogon:master Dec 31, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants