Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on remote database transient issues in JDBC connectors #23302

Merged
merged 2 commits into from
Oct 12, 2024

Conversation

kokosing
Copy link
Member

@kokosing kokosing commented Sep 5, 2024

This PR is based on #23330

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In past we also discussed implementing the retry in the JdbcRecordCursor IIRC but that one would only cover retries during table scans, so metadata etc. would not be retried. I like this approach better.

re: nested retries, why do you say max time we could retry would be 30s. Can't it be the case that connection is retried and then operation is also retried?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to self. Revisit this file.

@kokosing
Copy link
Member Author

kokosing commented Sep 6, 2024

re: nested retries, why do you say max time we could retry would be 30s. Can't it be the case that connection is retried and then operation is also retried?

Yes, it can be like that, but max duration is measured on the top level. I mean they should not multiply. They can however sum a bit. The most outer retry won't retry if the operation takes longer than max duration.

    public static void main(String[] args)
    {
        System.out.println(Instant.now());
        RetryPolicy<Object> policy = new RetryingModule().createRetryPolicy(ImmutableSet.of(new RetryOnSqlTransientException()));
        try {
            retry(
                    policy,
                    () -> retry(
                            policy,
                            () -> retry(
                                    policy,
                                    () -> retry(
                                            policy,
                                            retry(
                                                    policy,
                                                    () -> {throw new SQLTransientException();})))));
        } finally {
            System.out.println(Instant.now());
        }
    }

Returns:

2024-09-06T07:53:53.048912Z
2024-09-06T07:54:44.319246Z
Exception in thread "main" dev.failsafe.FailsafeException: java.sql.SQLTransientException
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:196)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.trino.plugin.jdbc.RetryingModule.retry(RetryingModule.java:85)
	...

@kokosing kokosing force-pushed the origin/master/003_retry branch 3 times, most recently from b989b3b to d8006b6 Compare September 7, 2024 22:00
@kokosing
Copy link
Member Author

kokosing commented Sep 7, 2024

re: nested retries, why do you say max time we could retry would be 30s. Can't it be the case that connection is retried and then operation is also retried?

@hashhar I added explanation to the commit message:

Here we can have nested retrying. One in jdbc client and other in connection factory.
It is ok as as in both places we are using the retry policy with max duration to 30
seconds. The outer retries won't retry if the operation takes longer than 30s.

@kokosing kokosing force-pushed the origin/master/003_retry branch 5 times, most recently from bcf2be4 to 84d5ef2 Compare September 9, 2024 21:55
@@ -60,6 +71,6 @@ public List<Type> getColumnTypes()
@Override
public RecordCursor cursor()
{
return new JdbcRecordCursor(jdbcClient, executor, session, split, table, columnHandles);
return retry(policy, () -> new JdbcRecordCursor(jdbcClient, executor, session, split, table, columnHandles));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here, why we do not use RetryJdbcClient directly
The JdbcRecordCursor will use the RetryJdbcClient itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jdbcClient is already here wrapped with RetryingJdbcClient. The issue here is that JdbcRecordCursor is using jdbcClient in a way that it is not enough. For example you cannot retry jdbc query using the same connection. The connection could be already invalid. Hence we need to retry on higher level.

@hashhar
Copy link
Member

hashhar commented Sep 13, 2024

They can however sum a bit. The most outer retry won't retry if the operation takes longer than max duration.

This is basically what I was thinking of. Thanks for answering that, the updated commit message makes it clear for me.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good % cb2cd50#r1758332145

Comment on lines +323 to +321
// no retrying as it could be not idempotent operation
delegate.setColumnType(session, handle, column, type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting column type should be idempotent actually since we only specify the target type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 330 to 328
// no retrying as it could be not idempotent operation
delegate.dropNotNullConstraint(session, handle, column);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe to retry but the code needs to handle exceptions if the constraint was dropped already (e.g. timeouts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let leave it as follow. I will update the comment then.

Copy link

github-actions bot commented Oct 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar addressed your comments, I will be merging this once CI is passing. Please take a look at the fixup commit.

Comment on lines +323 to +321
// no retrying as it could be not idempotent operation
delegate.setColumnType(session, handle, column, type);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 330 to 328
// no retrying as it could be not idempotent operation
delegate.dropNotNullConstraint(session, handle, column);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let leave it as follow. I will update the comment then.

So far JDBC connectors were able to retry in case of transient issues
when establishing connections. RetryingJdbcClient is able to retry an
operation on remote database during other situations.

Here we can have nested retrying. One in jdbc client and other in connection factory.
It is ok as as in both places we are using the retry policy with max duration to 30
seconds. The outer retries won't retry if the operation takes longer than 30s.
Creating PreparedStatement can fail due transient issues in remote
database. Let's retry it in the same as in JdbcClient.
@kokosing kokosing merged commit 044f85e into trinodb:master Oct 12, 2024
61 checks passed
@github-actions github-actions bot added this to the 462 milestone Oct 12, 2024
@kokosing kokosing deleted the origin/master/003_retry branch October 12, 2024 17:57
@hashhar
Copy link
Member

hashhar commented Oct 14, 2024

Thank you @kokosing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants