Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sftp retry on connection lost #465

Merged
merged 18 commits into from
May 20, 2024
Merged

Conversation

Gauravudia
Copy link
Contributor

@Gauravudia Gauravudia commented May 6, 2024

Description

  • Added a retry mechanism for handling connection lost errors encountered for Idle connections.
  • Reset the SFTP client on connection lost error.

Linear Ticket

Resolves INT-2103
https://linear.app/rudderstack/issue/INT-2103/go-kit-sftp-retry-on-connection-lost-error

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@Gauravudia Gauravudia self-assigned this May 6, 2024
Copy link

codecov bot commented May 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 35.73%. Comparing base (bc95a3f) to head (8a8a9ff).
Report is 4 commits behind head on main.

Current head 8a8a9ff differs from pull request most recent head 72b6ef0

Please upload reports for the commit 72b6ef0 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #465       +/-   ##
===========================================
- Coverage   78.98%   35.73%   -43.26%     
===========================================
  Files          90       52       -38     
  Lines        6747     3946     -2801     
===========================================
- Hits         5329     1410     -3919     
- Misses       1153     2409     +1256     
+ Partials      265      127      -138     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Gauravudia Gauravudia requested a review from koladilip May 7, 2024 04:41
@Gauravudia Gauravudia marked this pull request as ready for review May 15, 2024 05:53
sftp/sftp_test.go Outdated Show resolved Hide resolved
sftp/sftp_test.go Outdated Show resolved Hide resolved
sftp/sftp_test.go Outdated Show resolved Hide resolved
sftp/sftp.go Outdated
}

func isConnectionLostError(err error) bool {
return strings.Contains(err.Error(), "connection lost")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more standardised way to do this check? Does the underlying library provide specific errors we can use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with ErrSshFxConnectionLost error provided by go sftp library

sftp/sftp.go Outdated
return r.fileManager.Reset()
}

func (r *retryableFileManagerImpl) retryOnConnectionLost(fileOperation func() error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are only retrying once, with no delay. Assuming a connection issue we might need a few more retries spaced out in time to handle it better.

I would suggest using backoff retry, like we do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that is required we can handle in client side
cc- @koladilip

Copy link
Contributor

@koladilip koladilip May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvrach this is not intend for general retry but retry when connection is closed due to idle time so we didn't implement that way but we can consider that too if required

sftp/sftp.go Outdated

func (fm *fileManagerImpl) Reset() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose Reset method, and why is it necessary to create a new client every time we want to retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sftp server will throw the connection lost error for idle connections. In Reset method we are creating a new client with same config.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to wrap the client in our own to encapsulate this behaviour and make it transparent to the end user of this library?

I also find confusing that inside *fileManagerImpl.Reset() you're using the constructor to build another **fileManagerImpl but you're discarding everything else and keeping the client. Seems like an anti-pattern.

Also the constructor is returning an unexported type. Might as well return the interface there since you have exported it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the Reset() method and used newSFTPClientFromConfig() for sftp client creation instead of using file manager constructor.

sftp/sftp.go Outdated Show resolved Hide resolved
sftp/sftp.go Outdated
Comment on lines 138 to 141
fileOperation := func() error {
return r.fileManager.Delete(remoteFilePath)
}
return r.retryOnConnectionLost(fileOperation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the right approach? What if the connection is lost in the middle of the Delete operation?

Is the Delete operation idempotent?

// Remove removes the specified file or directory. An error will be returned if no
// file or directory with the specified path exists, or if the specified directory
// is not empty.
func (c *Client) Remove(path string) error {

What happens if we issue a delete, the file is deleted but before the client can return a response the connection is lost. We would retry and fail with an error even though the file was correctly deleted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, wouldn't it make sense to have the retries on the SSHClient instead of having them in the FileManager? This way you don't retry the whole thing (since upload and download do multiple things) but you can retry only the last thing that didn't succeed. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for retrying on the SFTP client (part of the FileManager) instead of the SSHClient is:

  • A connection lost error requires creating a new connection. Simply creating a new SSHClient would not suffice; we would also need to create a new SFTP client using the existing SSHClient with sftp.NewClient(sshClient).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fracasula
#465 (comment)

This connection lost is different. The retry we implemented is more specific to "retry on idle connections". We can rename the functions and variables for better clarity.
cc- @koladilip

Copy link
Collaborator

@fracasula fracasula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I left a few more comments, we can review them together.

sftp/client.go Show resolved Hide resolved
sftp/client.go Outdated Show resolved Hide resolved
sftp/sftp.go Outdated Show resolved Hide resolved
sftp/sftp.go Outdated Show resolved Hide resolved
@Gauravudia Gauravudia merged commit 8383ee7 into main May 20, 2024
16 of 18 checks passed
@Gauravudia Gauravudia deleted the fix.sftp-retryOnConnectionLost branch May 20, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants