Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong schema for connection to read-instance when using schema based multi tenancy from Hibernate #679

Closed
matheisco opened this issue Oct 11, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@matheisco
Copy link

Describe the bug

In our Spring Boot application we use schema-based multi tenancy from hibernate. When activating the readWriteSplitting plugin, the connection to the read-instance doesn't get the correct schema set.

Expected Behavior

The overridden ConnectionProvider.getConnection() method should also be executed for the read-instance, and not only for the write instance.

What plugins are used? What other connection properties were set?

readWriteSplitting,failover,efm

Current Behavior

connection.setSchema(schema); is only executed for the write-instance. Therefore, when the aws wrapper uses the read-instance, the public schema is used, which is not correct.

Reproduction Steps

We instantiate two TenantBasedConnectionProvider, each with a different schema:

import org.hibernate.engine.jdbc.connections.spi.ConnectionProvider;

@RequiredArgsConstructor
public class TenantBasedConnectionProvider implements ConnectionProvider {
    
    @NonNull
    private final transient DataSource dataSource;
    @NonNull
    private final String schema;
    
    // ...

    @Override
    public Connection getConnection() throws SQLException {
        Connection connection = dataSource.getConnection();
        connection.setSchema(schema);
        return connection;
    }

And here the correct TenantBasedConnectionProvider is selected, depending on which schema should be used for the current request:

import org.hibernate.engine.jdbc.connections.spi.AbstractMultiTenantConnectionProvider;
import org.springframework.boot.autoconfigure.orm.jpa.HibernatePropertiesCustomizer;

@Component
public class CustomMultiTenantConnectionProvider extends AbstractMultiTenantConnectionProvider
        implements HibernatePropertiesCustomizer {

    private final Map<String, ConnectionProvider> tenantIdentifierToConnectionProviderMap;

    // ...

    @Override
    protected ConnectionProvider selectConnectionProvider(String tenantIdentifier) {
        return tenantIdentifierToConnectionProviderMap.get(tenantIdentifier);
    }

The connection to the DB is defined in the application.properties like this:

spring.datasource.url=jdbc:aws-wrapper:postgresql://write-instance:5432/aaaa
spring.datasource.username=un
spring.datasource.password=pw
spring.datasource.driver-class-name=software.amazon.jdbc.Driver
spring.jpa.database-platform=org.hibernate.dialect.PostgreSQL10Dialect
spring.jpa.properties.hibernate.multiTenancy=SCHEMA

Possible Solution

No response

Additional Information/Context

No response

The AWS Advanced JDBC Driver version used

same behaviour with 1.0.1 and 2.2.5

JDK version used

17.0.6

Operating System and version

macOS 13.2.1

@matheisco matheisco added the bug Something isn't working label Oct 11, 2023
@brunos-bq
Copy link
Contributor

Hello @matheisco ,

Let's quickly take a look of what is happening here and see how we can resolve this issue.

It seems that you are assigning the schema in the TenantBasedConnectionProvider class that implements the Hibernate ConnectionProvider interface. During the execution of the AWS JDBC Driver, while the readWriteSplitting plugin is activated, the wrapper driver might end up needing to open connections internally, and those would be handled by and internal connection provider (also called ConnectionProvider within the wrapper) and not the one from Hibernate.

The solution for the issue would be to find a way to pass the schema object along to the internal ConnectionProvider in the wrapper. The best solution would be to create a custom ConnectionProvider, extending the existing DataSourceConnectionProvider. This new subclass would then implement connect() and there would be where you can set the schema to the connection.

Once the custom extension of the ConnectionProvider is ready, you will then have to set the ConnectionProviderManager to use it for newly created connections. This would happen using the ConnectionProviderManager.setConnectionProvider() method. You will find a sample code that does a similar operation here: https://github.com/awslabs/aws-advanced-jdbc-wrapper/blob/main/examples/AWSDriverE[…]java/software/amazon/InternalConnectionPoolPasswordWarning.java

Please let us know if you have any more questions.

@matheisco
Copy link
Author

Hi @brunos-bq,

Thanks for the detailed answer!

I gave it a try and it doesn't seem to behave in the desired way. The reader instance seems to be never used:

An error occurred while trying to switch to a cached reader connection: 'xxxx'. Error message: 'Cannot change transaction isolation level in the middle of a transaction.'. The driver will attempt to establish a new reader connection.
Failed to switch to a reader; the current writer will be used as a fallback: 'xxxx'

Even when we fix this error, there seems to be a more fundamental problem: when the query hits the writer instance in this situation, the selected schema is wrong.
I think I wasn't clear enough about the use case when I opened the ticket. When a request from a client comes in with the path

  • /tenant1/resource, then the application must only connect to schema_tenant1
  • /tenant2/resource, then the application must only connect to schema_tenant2

So it's crucial that every thread in the application executes connection.setSchema(schema); and then only uses that particular connection. It seems to me that this is not guaranteed when using a custom DataSourceConnectionProvider.connect().

This is the implementation I used for testing:
CustomDataSourceConnectionProvider:

@Component
public class CustomDataSourceConnectionProvider extends DataSourceConnectionProvider {

  private final @NonNull DataSource dataSource;

  public CustomDataSourceConnectionProvider(
          final @NonNull DataSource dataSource) {
    super(dataSource, new PgTargetDriverDialect());
    this.dataSource = dataSource;
  }

  @Override
  public Connection connect(
      final @NonNull String protocol,
      final @NonNull Dialect dialect,
      final @NonNull HostSpec hostSpec,
      final @NonNull Properties props)
      throws SQLException {
    Connection connection = dataSource.getConnection();
    String schema = "schema_tenant1";
    Tenant currentTenant = getCurrentTenantFromThreadLocal();
    if (currentTenant == Tenant.TENANT2) {
        schema = "schema_tenant2";
    }
    connection.setSchema(schema);
    return connection;
  }
}

Registering with ConnectionProviderManager:

@Configuration
public class AwsTenancyConfiguration {

    @Autowired
    private CustomDataSourceConnectionProvider connectionProvider;

    @PostConstruct
    public void setupAWSConnectionProvider() {
        ConnectionProviderManager.setConnectionProvider(connectionProvider);
    }
}

@brunos-bq
Copy link
Contributor

Hello @matheisco,

We have looked at the sample code you have posted above, and we came back with a couple of updates. We believe there are now two distinct manners to solve your issue.

The first one is to continue with the idea of creating a custom ConnectionProvider.
After reviewing your code sample, we believe it would be a better idea to extend DriverConnectionProvider instead of DataSourceConnectionProvider.

The code sample that we believe would help you to achieve what you want would look more or less like this:

@Component
public class CustomConnectionProvider extends DriverConnectionProvider {

  public CustomConnectionProvider() {
	super(new org.postgresql.Driver(), new AuroraPgDialect()); // or PgDialect
  }
  
  @Override
  public Connection connect(
      final @NonNull String protocol,
      final @NonNull Dialect dialect,
      final @NonNull HostSpec hostSpec,
      final @NonNull Properties props)
      throws SQLException {
    Connection connection = super.connect(protocol, dialect, hostSpec, props);
    String schema = "schema_tenant1";
    Tenant currentTenant = getCurrentTenantFromThreadLocal();
    if (currentTenant == Tenant.TENANT2) {
        schema = "schema_tenant2";
    }
    connection.setSchema(schema);
    return connection;
  }
}

We suggest you to give it a try and to check if that solution gives the expected result.

The second possibility, would be to use a new ConnectionInitFunc function. This is something that we came up with while studying this issue.
As you can see in PR #705 - we have introduced a new ConnectionInitFunc interface, where one can define a function to be executed when a connection is acquired.
This function will then be executed every time there is a new connection, not requiring to extend a ConnectionProvider.

In order to try this solution, you would have to checkout our snapshot build.

@matheisco
Copy link
Author

matheisco commented Nov 20, 2023

Thanks @brunos-bq,

I tested both options and couldn't make it work properly.

The first problem (maybe unrelated) is an error I'm getting each time after an SQL statement was executed (I tested with readonly statements):

software.amazon.jdbc.plugin.readwritesplitting.ReadWriteSplittingSQLException: An error occurred while trying to switch to a writer connection.
	at software.amazon.jdbc.plugin.readwritesplitting.ReadWriteSplittingPlugin.logAndThrowException(ReadWriteSplittingPlugin.java:374)
	...
Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction.

The second problem is that sometimes the ConnectionInitFunc isn't called before executing an SQL statement making the overall request fail. I placed a log statement in the method to see when it is called. I don't know how to reproduce this behaviour properly, it happens rarely.

I can also see that the ConnectionInitFunc is called asynchronously, which shouldn't be necessary for our use case.

The most important thing for me is that the schema is set dynamically on every SQL statement that is executed.

EDIT:
The culprit for the Exception "Cannot change transaction isolation level in the middle of a transaction." was the following settings:

spring.jpa.properties.hibernate.connection.provider_disables_autocommit=true
spring.datasource.hikari.auto-commit=false

By removing those, the exception doesn't occur anymore, which also makes it easier to reproduce that ConnectionInitFunc isn't called (sending 40 parallel requests, 20 for schema_tenant1 and 20 for schema_tenant2).

@brunos-bq
Copy link
Contributor

Hello @matheisco,

Thanks for your reply.

The idea behind ConnectionInitFunc is to execute a custom piece of code every time a connection to an instance is opened, and not before every SQL statement execution. Do you verify that in your logs when you open a connection to your instances?

@matheisco
Copy link
Author

Hi @brunos-bq,

my response was imprecise, with the multi tenancy implementation we had up until now, connection.setSchema(schema); is executed when a new transaction starts, e.g. when in the application a method annotated with @Transactional is called.

So I was hoping to replicate this behaviour with the ConnectionInitFunc.

I'm now evaluating a different strategy: defining two DataSources, both pointing to the same database, but with a different currentSchema in the jdbc url. That way the connection to the reader instance gets automatically the correct schema via the url, without using the ConnectionInitFunc.

@matheisco
Copy link
Author

After reading more through the documentation, I was also wondering if you have general guidelines about the plugins which should be activated alongside the readWriteSplitting.
The default plugins seem to be auroraConnectionTracker,failover,efm. On the readWriteSplitting page there is an explicit example of readWriteSplitting,failover,efm.

I was wondering if that is intentional (eg. readWriteSplitting is incompatible with auroraConnectionTracker), or something like readWriteSplitting,auroraConnectionTracker,failover,efm is also fine?

That's maybe a bit off-topic, but any pointer is appreciated 😅

@brunos-bq
Copy link
Contributor

Hello @matheisco,

Thank you for reaching out.

All the plugins of the AWS JDBC Driver are meant to be compatible with each other. If there was a case of an incompatibility between two plugins, there would be a strong warning indicating it in the plugin documentation.

The examples in our repository are basic samples to guide users on how to setup or enable a plugin or feature, but do not represent a 'proper' or 'default' manner on how to use the feature/plugin.

With that in mind, it is absolutely fine to have readWriteSplitting,auroraConnectionTracker,failover,efm as your wrapper plugins. And also other plugins that you might need.

Please do not hesitate to reach out if you are still uncertain about some functionality of the AWS JDBC Driver.

By the way, did the solution you mentioned two messages ago was successful?

@matheisco
Copy link
Author

Thanks for the clarification!

The solution seems successful. For other people coming across this topic: I implemented an AbstractRoutingDataSource, there is a good explanation here.

When performance testing, I came across a performance degradation with the readWriteSplitting plugin, which is also present when removing any multitenancy code from the project, so I will create a separate issue for that.

Thanks for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants