Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of connection issues 1/ #61

Merged
merged 3 commits into from
Mar 11, 2021
Merged

Improve handling of connection issues 1/ #61

merged 3 commits into from
Mar 11, 2021

Conversation

kmille
Copy link
Contributor

@kmille kmille commented Feb 19, 2021

There was a problem in the automatic reconnect loop: If the connection to the IRC server is dropped it will not be reestablished because the call to client.Server() blocks forever. This is fixed by not calling it at this point. But there are other issues we have to tackle.

The first one is our locking. We still have a big issue here if the following happens:

  • if the connection goes down, client.Connect() returns an error
  • we acquire the lock: clientLock.Lock()
  • we sleep some time
  • we call client.Connect() again and it fails again because our Internet is still broken
  • As our handler (CONNECTED) was not called, the lock is still in use
  • we hang here forever

CptHook/irc.go

Lines 121 to 129 in 4d1839a

log.Info("Connecting to IRC server")
for {
if err := client.Connect(); err != nil {
clientLock.Lock()
log.Warnf("Connection to %s terminated: %s", client.Server(), err)
log.Warn("Reconnecting in 30 seconds...")
time.Sleep(30 * time.Second)
}
}

If we want to rewrite the lock logic, we have to think about two scenarios:

  1. How do we handle the time between irc-connection-is-down and the-library-knows-that-the-connection-is-down?
    The library sends continuously a PING (every 20 seconds is the smallest time interval we can use). If now() - time_last(PONG) is > 60 seconds, the connection goes in state disconnected. Then Connect() returns and we reconnect.

  2. What happens if the connection to the IRC server is down, but our http server works and is used? We can

  • send a 500 because the IRC connection is down (if we know it; good for alertmanager, probably bad for other things that don't automatically resend)
  • handle the received data in a go routine and always return a 200 (never blocks, but we have to buffer the messages until IRC is up again)
  • we keep the logic as it is and increase the channel size. Right now, http connections are held open if we cannot write into the channel (because the IRC sender does not work and stops reading from the channel)

@fleaz fleaz changed the base branch from master to development March 11, 2021 19:36
Felix Breidenstein and others added 3 commits March 11, 2021 20:46
Switch to the SSL port of hackint. In the meantime hackint uses Let's
encrypt so we can use /etc/ssl/certs/ca-certificates.crt for validation
here.
There was a problem in the automatic reconnect loop: If the connection to the IRC server is dropped it will not be reestablished because the call to `client.Server()` blocks forever. This is fixed by not calling it at this point. But there are other issues we have to tackle.

The first one is our locking. We still have a big issue here if the following happens:

- if the connection goes down, `client.Connect()` returns an error
- we acquire the lock: `clientLock.Lock()`
- we sleep some time
- we call `client.Connect()` again and it fails again because our Internet is still broken
- As our handler (CONNECTED) was not called, the lock is still in use
- we hang here forever

If we want to rewrite the lock logic, we have to think about two scenarios:

1) How do we handle the time between irc-connection-is-down and the-library-knows-that-the-connection-is-down?
The library sends continuously a PING (every 20 seconds is the smallest time interval we can use). If now() - time_last(PONG) is > 60 seconds, the connection goes in state disconnected. Then `Connect()` returns and we reconnect.

2) What happens if the connection to the IRC server is down, but our http server works and is used? We can

- send a 500 because the IRC connection is down (if we know it; good for alertmanager, probably bad for other things that don't automatically resend)
- handle the received data in a go routine and always return a 200 (never blocks, but we have to buffer the messages until IRC is up again)
- we keep the logic as it is and increase the channel size. Right now, http connections are held open if we cannot write into the channel (because the IRC sender does not work and stops reading from the channel)
@fleaz fleaz merged commit 8fca0bc into fleaz:development Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants