Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tracked-consumers memory pool be the default. #11949

Merged
merged 3 commits into from
Aug 15, 2024

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Aug 12, 2024

Which issue does this PR close?

Closes #11523

Rationale for this change

We would like the improved OOM error messages, which lists the top overall memory reservation consumers, to be the default.

What changes are included in this PR?

Make TrackConsumersPool be the default.
Include benchmarks to assess potential performance overhead from added mutex.

Are these changes tested?

Yes, for the existing tests using the default pool.

Are there any user-facing changes?

No API changes.
Change in OOM messages returned.

@github-actions github-actions bot added the core Core DataFusion crate label Aug 12, 2024
@wiedld
Copy link
Contributor Author

wiedld commented Aug 12, 2024

gcp c3d-standard-4
debian-12-bookworm


Benchmark clickbench_1

Query main_base 11523_tracked-consumers-default Change
QQuery 0 0.52ms 0.53ms no change
QQuery 1 91.61ms 92.41ms no change
QQuery 2 235.65ms 233.69ms no change
QQuery 3 195.27ms 189.60ms no change
QQuery 4 2253.70ms 2310.43ms no change
QQuery 5 2113.38ms 2299.27ms 1.09x slower
QQuery 6 84.20ms 82.10ms no change
QQuery 7 97.18ms 97.45ms no change
QQuery 8 3212.14ms 3615.31ms 1.13x slower
QQuery 9 2539.55ms 2537.32ms no change
QQuery 10 1997.85ms 1998.86ms no change
QQuery 11 2068.84ms 2088.95ms no change
QQuery 12 3082.54ms 3277.47ms 1.06x slower
QQuery 13 4732.25ms 5160.32ms 1.09x slower
QQuery 14 3866.84ms 4161.76ms 1.08x slower
QQuery 15 2649.19ms 2758.73ms no change
QQuery 16 6131.58ms 6935.31ms 1.13x slower
QQuery 17 6047.45ms 6782.28ms 1.12x slower
QQuery 18 10795.80ms 12602.46ms 1.17x slower
QQuery 19 169.50ms 163.48ms no change
QQuery 20 3270.49ms 3224.10ms no change
QQuery 21 5061.89ms 5050.48ms no change
QQuery 22 12660.40ms 12715.04ms no change
QQuery 23 27705.44ms 28134.60ms no change
QQuery 24 2485.82ms 2479.34ms no change
QQuery 25 2273.66ms 2284.05ms no change
QQuery 26 2562.82ms 2560.38ms no change
QQuery 27 4308.90ms 4316.42ms no change
QQuery 28 36444.61ms 36264.41ms no change
QQuery 29 1386.33ms 1380.30ms no change
QQuery 30 3864.30ms 3874.16ms no change
QQuery 31 4622.73ms 4637.74ms no change
QQuery 32 21909.22ms 22018.07ms no change
QQuery 33 10431.35ms 10353.16ms no change
QQuery 34 10348.38ms 10348.83ms no change
QQuery 35 4157.44ms 4150.37ms no change
QQuery 36 226.46ms 230.03ms no change
QQuery 37 166.41ms 165.30ms no change
QQuery 38 132.29ms 141.74ms 1.07x slower
QQuery 39 790.22ms 772.59ms no change
QQuery 40 60.17ms 67.44ms 1.12x slower
QQuery 41 57.77ms 59.21ms no change
QQuery 42 71.12ms 69.47ms no change
Benchmark Summary
Total Time (main_base) 207363.25ms
Total Time (11523_tracked-consumers-default) 212684.95ms
Average Time (main_base) 4822.40ms
Average Time (11523_tracked-consumers-default) 4946.16ms
Queries Faster 0
Queries Slower 10
Queries with No Change 33

Benchmark clickbench_extended

Query main_base 11523_tracked-consumers-default Change
QQuery 0 2903.17ms 2901.14ms no change
QQuery 1 1958.32ms 1960.05ms no change
QQuery 2 4229.81ms 4240.86ms no change
Benchmark Summary
Total Time (main_base) 9091.30ms
Total Time (11523_tracked-consumers-default) 9102.06ms
Average Time (main_base) 3030.43ms
Average Time (11523_tracked-consumers-default) 3034.02ms
Queries Faster 0
Queries Slower 0
Queries with No Change 3

Benchmark tpch_mem_sf1

Query main_base 11523_tracked-consumers-default Change
QQuery 1 233.25ms 234.22ms no change
QQuery 2 32.08ms 33.79ms 1.05x slower
QQuery 3 48.46ms 49.60ms no change
QQuery 4 66.33ms 70.66ms 1.07x slower
QQuery 5 89.77ms 91.78ms no change
QQuery 6 12.62ms 13.14ms no change
QQuery 7 196.72ms 214.45ms 1.09x slower
QQuery 8 35.77ms 36.44ms no change
QQuery 9 103.33ms 105.74ms no change
QQuery 10 90.69ms 92.44ms no change
QQuery 11 61.22ms 56.91ms +1.08x faster
QQuery 12 50.34ms 50.73ms no change
QQuery 13 94.11ms 92.03ms no change
QQuery 14 14.65ms 14.60ms no change
QQuery 15 23.76ms 23.82ms no change
QQuery 16 36.61ms 35.04ms no change
QQuery 17 125.72ms 125.65ms no change
QQuery 18 465.75ms 496.46ms 1.07x slower
QQuery 19 49.22ms 49.25ms no change
QQuery 20 71.56ms 70.09ms no change
QQuery 21 299.61ms 304.77ms no change
QQuery 22 21.75ms 21.89ms no change
Benchmark Summary
Total Time (main_base) 2223.34ms
Total Time (11523_tracked-consumers-default) 2283.51ms
Average Time (main_base) 101.06ms
Average Time (11523_tracked-consumers-default) 103.80ms
Queries Faster 1
Queries Slower 4
Queries with No Change 17

Benchmark tpch_sf1

Query main_base 11523_tracked-consumers-default Change
QQuery 1 361.97ms 373.23ms no change
QQuery 2 46.97ms 45.92ms no change
QQuery 3 122.87ms 124.01ms no change
QQuery 4 74.78ms 75.28ms no change
QQuery 5 178.84ms 179.30ms no change
QQuery 6 93.50ms 93.79ms no change
QQuery 7 249.57ms 261.37ms no change
QQuery 8 160.19ms 163.98ms no change
QQuery 9 247.74ms 260.08ms no change
QQuery 10 201.14ms 209.42ms no change
QQuery 11 34.95ms 34.65ms no change
QQuery 12 113.22ms 115.58ms no change
QQuery 13 152.76ms 161.81ms 1.06x slower
QQuery 14 109.38ms 111.75ms no change
QQuery 15 166.23ms 167.24ms no change
QQuery 16 39.49ms 39.91ms no change
QQuery 17 272.43ms 273.02ms no change
QQuery 18 403.52ms 408.86ms no change
QQuery 19 197.51ms 198.32ms no change
QQuery 20 143.55ms 144.76ms no change
QQuery 21 278.38ms 274.40ms no change
QQuery 22 30.08ms 29.91ms no change
Benchmark Summary
Total Time (main_base) 3679.07ms
Total Time (11523_tracked-consumers-default) 3746.59ms
Average Time (main_base) 167.23ms
Average Time (11523_tracked-consumers-default) 170.30ms
Queries Faster 0
Queries Slower 1
Queries with No Change 21

@wiedld wiedld marked this pull request as ready for review August 12, 2024 16:02
@alamb alamb changed the title Use the tracked-consumers memory pool be the default. Use tracked-consumers memory pool be the default. Aug 12, 2024
@alamb
Copy link
Contributor

alamb commented Aug 12, 2024

I am going to re-run the benchmarks on my machine and see what I get

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld

I think this is a good change in my opinion. If we have some time to look into profiling and see the memory pool show up at all, we can re-asses.

Also there is a workaround if anyone experiences a performance slowdown due to this change (which is to verride the memory pool)

@github-actions github-actions bot added physical-expr Physical Expressions execution Related to the execution crate labels Aug 13, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @wiedld
I think it is good for now, perhaps in future if there are lots of memory consumers then the error message might be unreadbale

@wiedld
Copy link
Contributor Author

wiedld commented Aug 14, 2024

lgtm thanks @wiedld I think it is good for now, perhaps in future if there are lots of memory consumers then the error message might be unreadbale

Absolutely agreed. I set the default to have the top 5 consumers listed, altho we can change that based on readability.

@alamb alamb merged commit 4baa901 into apache:main Aug 15, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

Thanks again @wiedld and @comphead

@alamb alamb deleted the 11523/tracked-consumers-default branch August 15, 2024 16:57
wiedld added a commit to influxdata/arrow-datafusion that referenced this pull request Aug 15, 2024
* feat(11523): set the default memory pool to the tracked-consumer pool

* test(11523): update tests for the OOM message including the top consumers

* chore(11523): remove duplicate wording from OOM messages
wiedld added a commit to wiedld/arrow-datafusion that referenced this pull request Aug 15, 2024
* feat(11523): set the default memory pool to the tracked-consumer pool

* test(11523): update tests for the OOM message including the top consumers

* chore(11523): remove duplicate wording from OOM messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate execution Related to the execution crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resources exhausted errors are confusing return the biggest memory consumers.
3 participants