-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sftp retry on connection lost #465
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #465 +/- ##
===========================================
- Coverage 78.98% 35.73% -43.26%
===========================================
Files 90 52 -38
Lines 6747 3946 -2801
===========================================
- Hits 5329 1410 -3919
- Misses 1153 2409 +1256
+ Partials 265 127 -138 ☔ View full report in Codecov by Sentry. |
sftp/sftp.go
Outdated
} | ||
|
||
func isConnectionLostError(err error) bool { | ||
return strings.Contains(err.Error(), "connection lost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more standardised way to do this check? Does the underlying library provide specific errors we can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with ErrSshFxConnectionLost
error provided by go sftp library
sftp/sftp.go
Outdated
return r.fileManager.Reset() | ||
} | ||
|
||
func (r *retryableFileManagerImpl) retryOnConnectionLost(fileOperation func() error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are only retrying once, with no delay. Assuming a connection issue we might need a few more retries spaced out in time to handle it better.
I would suggest using backoff retry, like we do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is required we can handle in client side
cc- @koladilip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lvrach this is not intend for general retry but retry when connection is closed due to idle time so we didn't implement that way but we can consider that too if required
sftp/sftp.go
Outdated
|
||
func (fm *fileManagerImpl) Reset() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expose Reset
method, and why is it necessary to create a new client every time we want to retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sftp server will throw the connection lost
error for idle connections. In Reset method we are creating a new client with same config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to wrap the client in our own to encapsulate this behaviour and make it transparent to the end user of this library?
I also find confusing that inside *fileManagerImpl.Reset()
you're using the constructor to build another **fileManagerImpl
but you're discarding everything else and keeping the client. Seems like an anti-pattern.
Also the constructor is returning an unexported type. Might as well return the interface there since you have exported it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the Reset() method and used newSFTPClientFromConfig() for sftp client creation instead of using file manager constructor.
sftp/sftp.go
Outdated
fileOperation := func() error { | ||
return r.fileManager.Delete(remoteFilePath) | ||
} | ||
return r.retryOnConnectionLost(fileOperation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is the right approach? What if the connection is lost in the middle of the Delete
operation?
Is the Delete
operation idempotent?
// Remove removes the specified file or directory. An error will be returned if no
// file or directory with the specified path exists, or if the specified directory
// is not empty.
func (c *Client) Remove(path string) error {
What happens if we issue a delete, the file is deleted but before the client can return a response the connection is lost. We would retry and fail with an error even though the file was correctly deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, wouldn't it make sense to have the retries on the SSHClient
instead of having them in the FileManager
? This way you don't retry the whole thing (since upload and download do multiple things) but you can retry only the last thing that didn't succeed. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for retrying on the SFTP client (part of the FileManager) instead of the SSHClient is:
- A connection lost error requires creating a new connection. Simply creating a new SSHClient would not suffice; we would also need to create a new SFTP client using the existing SSHClient with sftp.NewClient(sshClient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This connection lost is different. The retry we implemented is more specific to "retry on idle connections". We can rename the functions and variables for better clarity.
cc- @koladilip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I left a few more comments, we can review them together.
Description
connection lost
errors encountered forIdle connections.
Linear Ticket
Resolves INT-2103
https://linear.app/rudderstack/issue/INT-2103/go-kit-sftp-retry-on-connection-lost-error
Security