Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RdsHostListProvider topology lookup creates new connection instead of using one from internal pool #1026

Closed
ivankhoosty opened this issue Jun 5, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@ivankhoosty
Copy link

Describe the bug

Target goal: to use failover and read/write splitter plugins in springboot application utilising internal driver hikari pools.

When we switched from external pool to internal pools and F0 config, we noticed that about 5% of our requests take disproportionally longer to process (we have 200ms max kill switch on requests, so very easy to monitor). Analysing these requests in datadog showed that they contain topology refresh queries. Extending clusterTopologyRefreshRateMs from default 30s to about 10mins confirmed the suspicion as the percentage of killed requests drastically reduced, as less topology refreshes were happening.

It appears that topology lookup creates a new connection to the cluster url, rather than use writer connections from the internal hikari pool. Connection creation takes about 300-350ms for us, which results in kill switch kicking in.

Expected Behavior

I would expect topology lookup query to run on internal writer connection from the hikari pool, not to create a new one with the cluster url.

Ideal solution would run topology queries/updates in a separate thread, rather than connections used by application. CacheMap would become just a (concurrent) map, and background thread would update topology at clusterTopologyRefreshRateMs intervals.

What plugins are used? What other connection properties were set?

ConnectTimeConnectionPluginFactory, ExecutionTimeConnectionPluginFactory, AuroraInitialConnectionStrategyPluginFactory, AuroraConnectionTrackerPluginFactory, FailoverConnectionPluginFactory, IamAuthConnectionPluginFactory

Current Behavior

this is datadog log showing that o.p.Driver shows cluster url and 300+ms to obtain a new connection, i have highlighted in Bold some log lines below:

Date,@fid,Message
"2024-06-03T13:17:57.964Z","""s.a.j.t.TargetDriverDialectManager""","Target driver dialect set to: 'pgjdbc', software.amazon.jdbc.targetdriverdialect.PgTargetDriverDialect@6d3f3553."
"2024-06-03T13:17:57.964Z","""s.a.j.d.DialectManager""","Current dialect: aurora-pg, software.amazon.jdbc.dialect.AuroraPgDialect, canUpdate: false"
"2024-06-03T13:17:57.964Z","""s.a.j.ConnectionPluginChainBuilder""","Plugins order has been rearranged. The following order is in effect: ConnectTimeConnectionPluginFactory, ExecutionTimeConnectionPluginFactory, AuroraInitialConnectionStrategyPluginFactory, AuroraConnectionTrackerPluginFactory, FailoverConnectionPluginFactory, IamAuthConnectionPluginFactory"
"2024-06-03T13:17:57.965Z","""s.a.j.p.f.FailoverConnectionPlugin""","failoverMode=STRICT_WRITER"
"2024-06-03T13:17:57.965Z","""s.a.j.h.RdsHostListProvider""","Topology: HostSpec[host=aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=100, null]"
"2024-06-03T13:17:57.965Z","""s.a.j.p.f.FailoverConnectionPlugin""","Changes: Host 'aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/': [NODE_ADDED]"
"2024-06-03T13:17:57.965Z","""s.a.j.PluginServiceImpl""","Set current host to HostSpec[host=aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=100, null]"
...
"2024-06-03T13:17:58.021Z","""s.a.j.p.i.IamAuthConnectionPlugin""","Generated new authentication token = 'aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/?DBUser=msabcdbdevAuroraReadWrite&Action=connect&X-Amz-AWS_Session_Token_Masked'"
"2024-06-03T13:17:58.021Z","""s.a.j.DriverConnectionProvider""","Connecting with properties: [sslmode] verify-ca[clusterTopologyRefreshRateMs] 500000[telemetryTracesBackend] OTLP[wrapperLoggerLevel] ALL[tcpKeepAlive] false[wrapperProfileName] abc-D0[telemetryMetricsBackend] OTLP[database] msabcdbdev[password] aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/?DBUser=msabcdbdevAuroraReadWrite&Action=connect&X-Amz-AWS_Session_Token_Maskedb7wtLw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240603T131758Z&X-Amz-SignedHeaders=host&X-Amz-Expires=899&X-Amz-Credential=AWS_Access_Key_ID_Masked%2F20240603%2Fus-east-1%2Frds-db%2Faws4_request&X-Amz-Signature=7775eaa86bd74f3e1b2199ecf551ec461d09ff0acba7e973c5420a1b7a75c840[enableTelemetry] true[connectTimeout] 10000[socketTimeout] 5000[currentSchema] application[loginTimeout] 10000[telemetrySubmitToplevel] false[user] msabcdbdevAuroraReadWrite[sslrootcert] /tmp/global-aws-rds-bundle.pem"
"2024-06-03T13:17:58.021Z","""s.a.j.DriverConnectionProvider""","Connecting to jdbc:postgresql://aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/msabcdbdevwith properties: [sslmode] verify-ca[password] ***[tcpKeepAlive] false[currentSchema] application[connectTimeout] 10[socketTimeout] 5[loginTimeout] 10[user] msabcdbdevAuroraReadWrite[sslrootcert] /tmp/global-aws-rds-bundle16428989952692044504.pem"
"2024-06-03T13:17:58.021Z","""o.p.Driver""","Connecting with URL: jdbc:postgresql://aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/msabcdbdev"
"2024-06-03T13:17:58.339Z","""s.a.j.d.DialectManager""","Current dialect: aurora-pg, software.amazon.jdbc.dialect.AuroraPgDialect, canUpdate: false"
"2024-06-03T13:17:58.343Z","""s.a.j.p.s.AuroraStaleDnsHelper""","Cluster endpoint resolves to 100.72.78.225."
"2024-06-03T13:17:58.346Z","""s.a.j.h.RdsHostListProvider""","Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.346Z","""s.a.j.p.f.FailoverConnectionPlugin""","Changes: Host 'aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com:6160/': [NODE_ADDED] Host 'aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com:6160/': [NODE_ADDED] Host 'aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/': [NODE_DELETED]"
"2024-06-03T13:17:58.346Z","""s.a.j.p.f.FailoverConnectionPlugin""","Node is no longer available in the topology: HostSpec[host=aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=100, null]"
"2024-06-03T13:17:58.346Z","""s.a.j.p.s.AuroraStaleDnsHelper""","Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.346Z","""s.a.j.p.s.AuroraStaleDnsHelper""","Writer host: HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.347Z","""s.a.j.p.s.AuroraStaleDnsHelper""","Writer host address: 100.72.78.225"
"2024-06-03T13:17:58.347Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.349Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.h.RdsHostListProvider""","Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ConnectTimeConnectionPlugin""","Connected in 385,421,844 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.setReadOnly'"
"2024-06-03T13:17:58.350Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: (blank) -> (blank)readOnly: false -> (blank)catalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.350Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: (blank) -> (blank)readOnly: false -> truecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.setReadOnly in 38,488 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.getAutoCommit'"
"2024-06-03T13:17:58.350Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: true -> (blank)readOnly: false -> truecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.getAutoCommit in 9,042 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.setAutoCommit'"
"2024-06-03T13:17:58.350Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: true -> falsereadOnly: false -> truecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.setAutoCommit in 20,823 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.createStatement'"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.createStatement in 3,779 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Statement.getConnection'"
"2024-06-03T13:17:58.350Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Statement.getConnection in 3,676 nanos."
"2024-06-03T13:17:58.350Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.350Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Statement.executeQuery'"
"2024-06-03T13:17:58.351Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Statement.executeQuery in 1,164,447 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.next'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.next in 7,712 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.getMetaData'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.getMetaData in 3,837 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSetMetaData.getColumnCount'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSetMetaData.getColumnCount in 2,951 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.getInt'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.getInt in 3,496 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.wasNull'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.wasNull in 3,184 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.next'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.next in 2,163 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'ResultSet.close'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed ResultSet.close in 65,402 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Statement.close'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Statement.close in 3,304 nanos."
"2024-06-03T13:17:58.352Z","""c.j.c.m.c.i.i.u.QueryRunner""","selectWithJdbcTemplate:executed select:313 in 2ms"
"2024-06-03T13:17:58.352Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.commit'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.commit in 300,912 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.setAutoCommit'"
"2024-06-03T13:17:58.352Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: true -> truereadOnly: false -> truecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.setAutoCommit in 30,540 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.h.RdsHostListProvider""","[From cache] Topology: HostSpec[host=aurora-reader-node-1-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, READER, AVAILABLE, weight=1302, 2024-06-03 13:17:57.0] HostSpec[host=aurora-cluster-node-0-xyz.abc.us-east-1.rds.amazonaws.com, port=6160, WRITER, AVAILABLE, weight=9, 2024-06-03 13:17:57.0]"
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.setReadOnly'"
"2024-06-03T13:17:58.352Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: true -> truereadOnly: false -> falsecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.setReadOnly in 13,703 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.isClosed'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.isClosed in 3,245 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.getWarnings'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.getWarnings in 3,527 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.clearWarnings'"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.clearWarnings in 3,391 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.p.DefaultConnectionPlugin""","Executing method: 'Connection.close'"
"2024-06-03T13:17:58.352Z","""s.a.j.s.SessionStateServiceImpl""","Current session state:autoCommit: true -> truereadOnly: false -> falsecatalog: (blank) -> (blank)schema: (blank) -> (blank)holdability: (blank) -> (blank)networkTimeout: (blank) -> (blank)transactionIsolation: (blank) -> (blank)typeMap: (blank) -> (blank)"
"2024-06-03T13:17:58.352Z","""s.a.j.p.ExecutionTimeConnectionPlugin""","Executed Connection.close in 321,711 nanos."
"2024-06-03T13:17:58.352Z","""s.a.j.ConnectionPluginManager""","Releasing resources."
"2024-06-03T13:17:58.353Z","""s.a.j.PluginServiceImpl""","Releasing resources."

Reproduction Steps

enable internal connection pool and configs D0 or F0 (will add read/write splitter) and check how long it takes to obtain connections with ConnectTimeConnectionPlugin on topology refreshing requests.

Possible Solution

we are using a "workaround" by extending topology refresh from default of 30 seconds to 7 days (pod destruction interval) to avoid any topology lookups , unless required because of failover (forcedRefresh)

Additional Information/Context

No response

The AWS Advanced JDBC Driver version used

2.3.6

JDK version used

21.0.1

Operating System and version

unix

@ivankhoosty ivankhoosty added the bug Something isn't working label Jun 5, 2024
@aaronchung-bitquill
Copy link
Contributor

Hi @ivankhoosty

Thank you for reaching out and raising this issue.

If I may make a recommendation, based on how you've described your use case, I do not think the F0 profile would be best. Correct me if I am wrong, but since you have a 200ms max kill switch on requests, it sounds like you expect relatively fast and predictable SQL execution times. The F0 profile is more suited towards applications with longer or less predictable query times. It enables the HostMonitoringConnectionPluginFactory, which may introduce some unneeded overhead in your case. As such, I would recommend a configuration preset from the D family with the socket timeout property overriden with something lower, like 1 second.

Onto the issue at hand, the topology lookup query should not be creating a new connection. It utilizes a pre-existing connection to execute the topology lookup query. The 300-350ms associated with the topology lookup however does sound quite high though. We will be looking at this.

I believe your workaround of extending the topology refresh to 7 days should still enable failover protection. However, if the topology changes, and the cache is obsolute, it may have some adverse affects. For example, if a reader instance is removed/replaced, the topology cache would still contain this removed reader instance. A new connection being made could potentially choose this removed reader instance and try to connect to it.

We'll be taking a look at this and keep you up to date as we investigate.

Thank you for your patience!

@ivankhoosty
Copy link
Author

thanks for your time @aaronchung-bitquill

yes, I have been using both D and F settings, in the datadog log attached I am using D0. I wanted to test F config with HostMonitoring plugin to see if we can use it, and it's perfectly fine, the culprit seems to the topology refresh.
You can see
"2024-06-03T13:17:58.350Z","""s.a.j.p.ConnectTimeConnectionPlugin""","Connected in 385,421,844 nanos." measures this
and reports 385 ms.
also the postgres driver reports the cluster url and the fact that a new connection is being created, instead of using a writer connection from internal pool:
"2024-06-03T13:17:58.021Z","""o.p.Driver""","Connecting with URL: jdbc:postgresql://aurora-cluster-xyz.cluster-abc.us-east-1.rds.amazonaws.com:6160/msabcdbdev"

I see your point about removed reader. I guess ultimately ReadWriteSplitter plugin would fall back to a writer connection, but this might take over 200ms :) and then we will know. This "workaround" is currently acceptable for us to move to using internal pools and ReadWriteSplitter plugin. Hopefully you can identify the issue and feel free to reach out if you need any more information/logs from us.

As a side note, in our (very large) organisation, access to db from local is only available via nlb with .elb. url, so any debugging/diagnosing from local is impossible as the driver falls into RdsUrlType.OTHER; we can only observe what happens via the logs in once deployed to aws. AuroraInitialConnectionStrategyPlugin has a lot of very useful // comments that could be turned into LOGGER.finest, as well as RdsHostListProvider class, to provide more insight as to what decisions are being made with topology and connection creation.

@ivankhoosty
Copy link
Author

this issue is now resolved in 2.4.0 and failover2 plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants