Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test): Update schema for broken ConnImplBenchmark test #3574

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

o-shevchenko
Copy link
Contributor

@o-shevchenko o-shevchenko commented Nov 14, 2024

I'm trying to use the executeSelect API and faced extremely slow reading.
I tried to use ConnImplBenchmark but noticed that the Shema was changed, and the test didn't work.

bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2017
image
image

Summary of Changes
Added Fields: airport_fee, data_file_year, data_file_month.
Removed Fields: dropoff_longitude, dropoff_latitude, pickup_longitude, pickup_latitude.

After fixing the test I can confirm that we have similar speed results for our use cases.
Reading 100_000 rows takes ~15-20 seconds, which is extremely slow.

 Running
ROW 100000 Time: 14978 ms
ROW 200000 Time: 16409 ms
ROW 300000 Time: 16966 ms
ROW 400000 Time: 15963 ms
ROW 500000 Time: 17480 ms

I'm not sure if there was any performance degradation recently since I can't find any expected numbers. It's hard to read this benchmark: https://cloud.google.com/blog/topics/developers-practitioners/introducing-executeselect-client-library-method-and-how-use-it/
According to this image, reading of 1_000_000 rows should take ~1sec
image

That's what I've got on my machine:

Benchmark                                            (rowLimit)  Mode  Cnt       Score       Error  Units
ConnImplBenchmark.iterateRecordsUsingReadAPI             500000  avgt    3   76549.893 ± 14496.839  ms/op
ConnImplBenchmark.iterateRecordsUsingReadAPI            1000000  avgt    3  154957.127 ± 25916.110  ms/op
ConnImplBenchmark.iterateRecordsWithBigQuery_Query       500000  avgt    3   82508.807 ± 17930.275  ms/op
ConnImplBenchmark.iterateRecordsWithBigQuery_Query      1000000  avgt    3  165717.219 ± 86960.648  ms/op
ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI      500000  avgt    3   84504.175 ± 36823.590  ms/op
ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI     1000000  avgt    3  165142.367 ± 99899.991  ms/op

I've opened an issue: googleapis/java-bigquerystorage#2764

@o-shevchenko o-shevchenko requested a review from a team as a code owner November 14, 2024 18:27
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Nov 14, 2024
Copy link

google-cla bot commented Nov 14, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/java-bigquery API. label Nov 14, 2024
@o-shevchenko o-shevchenko changed the title Fix ConnImplBenchmark test fix(test): Update schema for broken ConnImplBenchmark test Nov 15, 2024
@o-shevchenko
Copy link
Contributor Author

@alvarowolfx Could you please help with the review and performance evaluation?
Thanks!

@o-shevchenko
Copy link
Contributor Author

@alvarowolfx, did you have a chance to look into it?

@alvarowolfx
Copy link

@PhongChuong can you take a look on this one ?

Copy link
Contributor

@PhongChuong PhongChuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.
Lets discuss the slow read results in further in #2764.

@PhongChuong
Copy link
Contributor

/gcbrun

@o-shevchenko
Copy link
Contributor Author

Thanks for the fix. Lets discuss the slow read results in further in #2764.

Thanks for the reply. You probably mean googleapis/java-bigquerystorage#2764

@PhongChuong PhongChuong added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Dec 3, 2024
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 3, 2024
@PhongChuong PhongChuong merged commit 8cf4387 into googleapis:main Dec 3, 2024
17 checks passed
@o-shevchenko o-shevchenko deleted the benchmark branch December 10, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/java-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants