-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading data using the executeSelect API is slow #2764
Comments
I've created a simplified test to show performance: @Test
fun `test read`() {
val sql =
"""
SELECT *
FROM `pr`
""".trimIndent().replace("\n", " ")
val connectionSettings = ConnectionSettings.newBuilder()
.setRequestTimeout(300)
.setUseReadAPI(true)
.setMaxResults(5000)
.setUseQueryCache(true)
.build()
val connection = bigQueryOptionsBuilder.build().service.createConnection(connectionSettings)
val bqResult = connection.executeSelect(sql)
val resultSet = bqResult.resultSet
var n = 1
var lastTime = Instant.now()
while (++n < 1_000_000 && resultSet.next()) {
if (n % 30_000 == 0) {
val now = Instant.now()
val duration = Duration.between(lastTime, now)
println("ROW $n Time: ${duration.toMillis()} ms ${DateTimeFormatter.ISO_INSTANT.format(now)}")
lastTime = now
}
}
}
~5sec to read 30000 rows |
Related issue with benchmark: googleapis/java-bigquery#3574 |
After fixing the test I've got the following results.
That's not what we expected after reading the doc: https://cloud.google.com/blog/topics/developers-practitioners/introducing-executeselect-client-library-method-and-how-use-it/ Comparison with Chart Estimates Is there anything I missed? |
@Neenu1995 Could you please help address this issue? |
@leahecole, do you have any ideas on how to proceed here? |
Thanks for fixing the benchmark tests. The result is definitely unexpected. Did you see better results for the Read API in a previous version of AFAIK, the changes to executeSelect/Connection in the @yirutang , do you know if there are any recent changes/configuration changes that might cause the performance slowdown? |
Thanks for the reply @PhongChuong ! |
I've noticed that the first run usually is much faster. |
@PhongChuong Do you have any insights on that? |
@o-shevchenko do you have the query job IDs and read session IDs involved in these benchmark runs? @PhongChuong do you know if it's possible to have the client log these? Also, what is the benchmark configuration that you're using here? Where (e.g in what GCP region or multi-region) is your BigQuery data, where are you running the benchmark, etc? |
Hi @kmjung Let me know if you need any other details |
Can you provide this information also? e.g. are you using a GCE VM in the same region as your BQ data? Running in AWS? |
I am less convinced that this is not related to AWS <--> GCE network latency. I will reach out offline to collect more data from you. |
Thanks @kmjung . I've created a support case. |
We use executeSelect API to run SQL query and read results from BigQuery. We expected a good speed based on this article
Reading data using
executeSelect
API is extremely slow.Reading of 100_000 rows takes 23930 ms.
The profiling showed no prominent places where we spent most of the time.
Are there any recent changes that might cause performance degradation for such an API?
Do you have a benchmark to understand what performance we should expect?
Thanks!
Environment details
com.google.cloud:google-cloud-bigquery:2.43.3
Code example
The text was updated successfully, but these errors were encountered: