-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry on remote database transient issues in JDBC connectors #23302
Conversation
8a83fb8
to
935f9c7
Compare
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingJdbcClient.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In past we also discussed implementing the retry in the JdbcRecordCursor
IIRC but that one would only cover retries during table scans, so metadata etc. would not be retried. I like this approach better.
re: nested retries, why do you say max time we could retry would be 30s. Can't it be the case that connection is retried and then operation is also retried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to self. Revisit this file.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingModule.java
Outdated
Show resolved
Hide resolved
Yes, it can be like that, but max duration is measured on the top level. I mean they should not multiply. They can however sum a bit. The most outer retry won't retry if the operation takes longer than max duration.
Returns:
|
b989b3b
to
d8006b6
Compare
@hashhar I added explanation to the commit message:
|
bcf2be4
to
84d5ef2
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestRetryingConnectionFactory.java
Show resolved
Hide resolved
@@ -60,6 +71,6 @@ public List<Type> getColumnTypes() | |||
@Override | |||
public RecordCursor cursor() | |||
{ | |||
return new JdbcRecordCursor(jdbcClient, executor, session, split, table, columnHandles); | |||
return retry(policy, () -> new JdbcRecordCursor(jdbcClient, executor, session, split, table, columnHandles)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question here, why we do not use RetryJdbcClient
directly
The JdbcRecordCursor
will use the RetryJdbcClient
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jdbcClient
is already here wrapped with RetryingJdbcClient
. The issue here is that JdbcRecordCursor
is using jdbcClient
in a way that it is not enough. For example you cannot retry jdbc query using the same connection. The connection could be already invalid. Hence we need to retry on higher level.
84d5ef2
to
eaa9544
Compare
This is basically what I was thinking of. Thanks for answering that, the updated commit message makes it clear for me. |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingJdbcClient.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good % cb2cd50#r1758332145
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/RetryingJdbcClient.java
Show resolved
Hide resolved
// no retrying as it could be not idempotent operation | ||
delegate.setColumnType(session, handle, column, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting column type should be idempotent actually since we only specify the target type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// no retrying as it could be not idempotent operation | ||
delegate.dropNotNullConstraint(session, handle, column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe to retry but the code needs to handle exceptions if the constraint was dropped already (e.g. timeouts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let leave it as follow. I will update the comment then.
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
eaa9544
to
2873f5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar addressed your comments, I will be merging this once CI is passing. Please take a look at the fixup commit.
// no retrying as it could be not idempotent operation | ||
delegate.setColumnType(session, handle, column, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// no retrying as it could be not idempotent operation | ||
delegate.dropNotNullConstraint(session, handle, column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let leave it as follow. I will update the comment then.
2873f5e
to
751bcf2
Compare
So far JDBC connectors were able to retry in case of transient issues when establishing connections. RetryingJdbcClient is able to retry an operation on remote database during other situations. Here we can have nested retrying. One in jdbc client and other in connection factory. It is ok as as in both places we are using the retry policy with max duration to 30 seconds. The outer retries won't retry if the operation takes longer than 30s.
Creating PreparedStatement can fail due transient issues in remote database. Let's retry it in the same as in JdbcClient.
751bcf2
to
87122c1
Compare
Thank you @kokosing |
This PR is based on #23330