Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support distributed merge_into #13151

Merged
merged 72 commits into from
Oct 31, 2023

Conversation

JackTan25
Copy link
Contributor

@JackTan25 JackTan25 commented Oct 9, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR
1.checkout to right join
2.implement distributed merge into

old shuffle join logic execution stream is below:
image
new physical plan design:
image


This change is Reviewable

@vercel
Copy link

vercel bot commented Oct 9, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 31, 2023 7:28am

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 12, 2023
@JackTan25
Copy link
Contributor Author

This Pr is not a perfect implementation of distributed merge into. We should do rowids hash shuffle and make apply_delete_and_update distributed. But this will cost more time, so that's use this version firstly, let me do that in the next pr.

@JackTan25 JackTan25 marked this pull request as ready for review October 18, 2023 06:27
@JackTan25 JackTan25 added the ci-cloud Build docker image for cloud test label Oct 18, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-13151-d7a4c43

note: this image tag is only available for internal use,
please check the internal doc for more details.

@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-13151-6e2545e

note: this image tag is only available for internal use,
please check the internal doc for more details.

@JackTan25
Copy link
Contributor Author

Thanks for Winter @zhang2014 , give me good advice.

@JackTan25 JackTan25 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 30, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-13151-a483d6b

note: this image tag is only available for internal use,
please check the internal doc for more details.

@JackTan25
Copy link
Contributor Author

JackTan25 commented Oct 30, 2023

distributed mode.

test>> DATABEND_DSN="databend://root:@localhost:8118/?sslmode=disable&enable_experimental_merge_into=1&enable_distributed_merge_into=1" cargo run -r -- 3000 2>&1 | tee distributed_rr.log


[2023-10-30T15:53:14Z INFO  test_replace_recluster] executing table maintenance batch : 19638
[2023-10-30T15:53:14Z INFO  test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO  test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO  test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO  test_replace_recluster] Ok. merge-into batch : 2999
[2023-10-30T15:53:14Z INFO  test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ==========================
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ====verify table state====
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ==========================
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster] number of successfully executed merge-into statements : 3000
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster] CHECK: value of successfully executed merge-into statements
[2023-10-30T15:53:14Z INFO  test_replace_recluster] CHECK: value of successfully executed merge-into statements: client 3000000, server 3000000
[2023-10-30T15:53:14Z INFO  test_replace_recluster] CHECK: distinct ids: client 3000, server 3000
[2023-10-30T15:53:14Z INFO  test_replace_recluster] CHECK: value of correlated column
[2023-10-30T15:53:14Z INFO  test_replace_recluster] CHECK: full table scanning
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ======     PASSED      ====
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ========METRICS============
[2023-10-30T15:53:14Z INFO  test_replace_recluster] fuse_commit_mutation_unresolvable_conflict_total : 2328.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 17238874464.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 33190666751.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 29753.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 33448.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":2732.0},{"less_than":50.0,"count":2732.0},{"less_than":100.0,"count":2732.0},{"less_than":250.0,"count":2732.0},{"less_than":500.0,"count":2732.0},{"less_than":1000.0,"count":2732.0},{"less_than":2500.0,"count":2732.0},{"less_than":5000.0,"count":2732.0},{"less_than":10000.0,"count":2732.0},{"less_than":20000.0,"count":2732.0},{"less_than":30000.0,"count":2732.0},{"less_than":60000.0,"count":2732.0},{"less_than":300000.0,"count":2732.0},{"less_than":600000.0,"count":2732.0},{"less_than":1800000.0,"count":2732.0},{"less_than":null,"count":2732.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":5940.0},{"less_than":50.0,"count":5940.0},{"less_than":100.0,"count":5940.0},{"less_than":250.0,"count":5940.0},{"less_than":500.0,"count":5940.0},{"less_than":1000.0,"count":5940.0},{"less_than":2500.0,"count":5940.0},{"less_than":5000.0,"count":5940.0},{"less_than":10000.0,"count":5940.0},{"less_than":20000.0,"count":5940.0},{"less_than":30000.0,"count":5940.0},{"less_than":60000.0,"count":5940.0},{"less_than":300000.0,"count":5940.0},{"less_than":600000.0,"count":5940.0},{"less_than":1800000.0,"count":5940.0},{"less_than":null,"count":5940.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 2732.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 5940.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_append_blocks_counter_total : 5732.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_append_blocks_counter_total : 5940.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 3510381.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 773619.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3001.0},{"less_than":50.0,"count":3006.0},{"less_than":100.0,"count":3012.0},{"less_than":250.0,"count":3018.0},{"less_than":500.0,"count":3032.0},{"less_than":1000.0,"count":3066.0},{"less_than":2500.0,"count":3419.0},{"less_than":5000.0,"count":3427.0},{"less_than":10000.0,"count":3427.0},{"less_than":20000.0,"count":3427.0},{"less_than":30000.0,"count":3427.0},{"less_than":60000.0,"count":3427.0},{"less_than":300000.0,"count":3427.0},{"less_than":600000.0,"count":3427.0},{"less_than":1800000.0,"count":3427.0},{"less_than":null,"count":3427.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3028.0},{"less_than":50.0,"count":3092.0},{"less_than":100.0,"count":3108.0},{"less_than":250.0,"count":3114.0},{"less_than":500.0,"count":3115.0},{"less_than":1000.0,"count":3123.0},{"less_than":2500.0,"count":3428.0},{"less_than":5000.0,"count":3428.0},{"less_than":10000.0,"count":3428.0},{"less_than":20000.0,"count":3428.0},{"less_than":30000.0,"count":3428.0},{"less_than":60000.0,"count":3428.0},{"less_than":300000.0,"count":3428.0},{"less_than":600000.0,"count":3428.0},{"less_than":1800000.0,"count":3428.0},{"less_than":null,"count":3428.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_count : 3427.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_count : 3428.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_sum : 542425.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_sum : 440406.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 7.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 58.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 7000.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 58000.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":2732.0},{"less_than":50.0,"count":2732.0},{"less_than":100.0,"count":2732.0},{"less_than":250.0,"count":2732.0},{"less_than":500.0,"count":2732.0},{"less_than":1000.0,"count":2732.0},{"less_than":2500.0,"count":2732.0},{"less_than":5000.0,"count":2732.0},{"less_than":10000.0,"count":2732.0},{"less_than":20000.0,"count":2732.0},{"less_than":30000.0,"count":2732.0},{"less_than":60000.0,"count":2732.0},{"less_than":300000.0,"count":2732.0},{"less_than":600000.0,"count":2732.0},{"less_than":1800000.0,"count":2732.0},{"less_than":null,"count":2732.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":5940.0},{"less_than":50.0,"count":5940.0},{"less_than":100.0,"count":5940.0},{"less_than":250.0,"count":5940.0},{"less_than":500.0,"count":5940.0},{"less_than":1000.0,"count":5940.0},{"less_than":2500.0,"count":5940.0},{"less_than":5000.0,"count":5940.0},{"less_than":10000.0,"count":5940.0},{"less_than":20000.0,"count":5940.0},{"less_than":30000.0,"count":5940.0},{"less_than":60000.0,"count":5940.0},{"less_than":300000.0,"count":5940.0},{"less_than":600000.0,"count":5940.0},{"less_than":1800000.0,"count":5940.0},{"less_than":null,"count":5940.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 2732.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 5940.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 630.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 890.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_rows_total : 510381.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_matched_rows_total : 773619.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3000.0},{"less_than":100.0,"count":3000.0},{"less_than":250.0,"count":3000.0},{"less_than":500.0,"count":3000.0},{"less_than":1000.0,"count":3000.0},{"less_than":2500.0,"count":3000.0},{"less_than":5000.0,"count":3000.0},{"less_than":10000.0,"count":3000.0},{"less_than":20000.0,"count":3000.0},{"less_than":30000.0,"count":3000.0},{"less_than":60000.0,"count":3000.0},{"less_than":300000.0,"count":3000.0},{"less_than":600000.0,"count":3000.0},{"less_than":1800000.0,"count":3000.0},{"less_than":null,"count":3000.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_count : 3000.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_replace_blocks_counter_total : 566.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_replace_blocks_counter_total : 1265.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 153647863.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 327488439.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":6160.0},{"less_than":50.0,"count":6160.0},{"less_than":100.0,"count":6160.0},{"less_than":250.0,"count":6160.0},{"less_than":500.0,"count":6160.0},{"less_than":1000.0,"count":6160.0},{"less_than":2500.0,"count":6160.0},{"less_than":5000.0,"count":6160.0},{"less_than":10000.0,"count":6160.0},{"less_than":20000.0,"count":6160.0},{"less_than":30000.0,"count":6160.0},{"less_than":60000.0,"count":6160.0},{"less_than":300000.0,"count":6160.0},{"less_than":600000.0,"count":6160.0},{"less_than":1800000.0,"count":6160.0},{"less_than":null,"count":6160.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":9368.0},{"less_than":50.0,"count":9368.0},{"less_than":100.0,"count":9368.0},{"less_than":250.0,"count":9368.0},{"less_than":500.0,"count":9368.0},{"less_than":1000.0,"count":9368.0},{"less_than":2500.0,"count":9368.0},{"less_than":5000.0,"count":9368.0},{"less_than":10000.0,"count":9368.0},{"less_than":20000.0,"count":9368.0},{"less_than":30000.0,"count":9368.0},{"less_than":60000.0,"count":9368.0},{"less_than":300000.0,"count":9368.0},{"less_than":600000.0,"count":9368.0},{"less_than":1800000.0,"count":9368.0},{"less_than":null,"count":9368.0}]
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_count : 6160.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_count : 9368.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_sum : 1.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_unmatched_rows_total : 3510381.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] query_merge_into_unmatched_rows_total : 3773619.0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster]                            
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ======CLUSTERING INFO======
[2023-10-30T15:53:14Z INFO  test_replace_recluster] cluster_key : (to_yyyymmdd(insert_time), id)
[2023-10-30T15:53:14Z INFO  test_replace_recluster] block_count: 13
[2023-10-30T15:53:14Z INFO  test_replace_recluster] constant_block_count: 0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] unclustered_block_count: 0
[2023-10-30T15:53:14Z INFO  test_replace_recluster] average_overlaps: 8.7692
[2023-10-30T15:53:14Z INFO  test_replace_recluster] average_depth: 7
[2023-10-30T15:53:14Z INFO  test_replace_recluster] block_depth_histogram: {"00007":13}
[2023-10-30T15:53:14Z INFO  test_replace_recluster] ===========================

correctess passed,but get once "panicked at 'assertion failed: block_idx < segment_info.blocks.len()', src/query/storages/fuse/src/operations/merge_into/mutator/
matched_mutator.rs:221:13"

@JackTan25
Copy link
Contributor Author

JackTan25 commented Oct 30, 2023

standalone mode:

test>> DATABEND_DSN="databend://root:@localhost:8118/?sslmode=disable&enable_experimental_merge_into=1&enable_distributed_merge_into=0" cargo run -r -- 3000 2>&1 | tee rr.log

[2023-10-30T18:05:24Z INFO  test_replace_recluster] ==========================
[2023-10-30T18:05:24Z INFO  test_replace_recluster] ====verify table state====
[2023-10-30T18:05:24Z INFO  test_replace_recluster] ==========================
[2023-10-30T18:05:24Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:24Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:24Z INFO  test_replace_recluster] number of successfully executed merge-into statements : 3000
[2023-10-30T18:05:24Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:24Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:24Z INFO  test_replace_recluster] CHECK: value of successfully executed merge-into statements
[2023-10-30T18:05:24Z INFO  test_replace_recluster] CHECK: value of successfully executed merge-into statements: client 3000000, server 3000000
[2023-10-30T18:05:25Z INFO  test_replace_recluster] CHECK: distinct ids: client 3000, server 3000
[2023-10-30T18:05:25Z INFO  test_replace_recluster] CHECK: value of correlated column
[2023-10-30T18:05:25Z INFO  test_replace_recluster] CHECK: full table scanning
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ======     PASSED      ====
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:25Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ========METRICS============
[2023-10-30T18:05:25Z INFO  test_replace_recluster] fuse_commit_mutation_unresolvable_conflict_total : 1528.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 50356283966.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 61773.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":8250.0},{"less_than":50.0,"count":8250.0},{"less_than":100.0,"count":8250.0},{"less_than":250.0,"count":8250.0},{"less_than":500.0,"count":8250.0},{"less_than":1000.0,"count":8250.0},{"less_than":2500.0,"count":8250.0},{"less_than":5000.0,"count":8250.0},{"less_than":10000.0,"count":8250.0},{"less_than":20000.0,"count":8250.0},{"less_than":30000.0,"count":8250.0},{"less_than":60000.0,"count":8250.0},{"less_than":300000.0,"count":8250.0},{"less_than":600000.0,"count":8250.0},{"less_than":1800000.0,"count":8250.0},{"less_than":null,"count":8250.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 8250.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 14.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_append_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_append_blocks_counter_total : 11250.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 4284000.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3001.0},{"less_than":100.0,"count":3001.0},{"less_than":250.0,"count":3003.0},{"less_than":500.0,"count":3010.0},{"less_than":1000.0,"count":3016.0},{"less_than":2500.0,"count":3160.0},{"less_than":5000.0,"count":3202.0},{"less_than":10000.0,"count":3428.0},{"less_than":20000.0,"count":3428.0},{"less_than":30000.0,"count":3428.0},{"less_than":60000.0,"count":3428.0},{"less_than":300000.0,"count":3428.0},{"less_than":600000.0,"count":3428.0},{"less_than":1800000.0,"count":3428.0},{"less_than":null,"count":3428.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_count : 3428.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_apply_milliseconds_sum : 1902364.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 56.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 56000.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":8223.0},{"less_than":50.0,"count":8250.0},{"less_than":100.0,"count":8250.0},{"less_than":250.0,"count":8250.0},{"less_than":500.0,"count":8250.0},{"less_than":1000.0,"count":8250.0},{"less_than":2500.0,"count":8250.0},{"less_than":5000.0,"count":8250.0},{"less_than":10000.0,"count":8250.0},{"less_than":20000.0,"count":8250.0},{"less_than":30000.0,"count":8250.0},{"less_than":60000.0,"count":8250.0},{"less_than":300000.0,"count":8250.0},{"less_than":600000.0,"count":8250.0},{"less_than":1800000.0,"count":8250.0},{"less_than":null,"count":8250.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 8250.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 8983.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_rows_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_matched_rows_total : 1284000.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3000.0},{"less_than":100.0,"count":3000.0},{"less_than":250.0,"count":3000.0},{"less_than":500.0,"count":3000.0},{"less_than":1000.0,"count":3000.0},{"less_than":2500.0,"count":3000.0},{"less_than":5000.0,"count":3000.0},{"less_than":10000.0,"count":3000.0},{"less_than":20000.0,"count":3000.0},{"less_than":30000.0,"count":3000.0},{"less_than":60000.0,"count":3000.0},{"less_than":300000.0,"count":3000.0},{"less_than":600000.0,"count":3000.0},{"less_than":1800000.0,"count":3000.0},{"less_than":null,"count":3000.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_count : 3000.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_replace_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_replace_blocks_counter_total : 1683.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 470291517.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":11249.0},{"less_than":50.0,"count":11250.0},{"less_than":100.0,"count":11250.0},{"less_than":250.0,"count":11250.0},{"less_than":500.0,"count":11250.0},{"less_than":1000.0,"count":11250.0},{"less_than":2500.0,"count":11250.0},{"less_than":5000.0,"count":11250.0},{"less_than":10000.0,"count":11250.0},{"less_than":20000.0,"count":11250.0},{"less_than":30000.0,"count":11250.0},{"less_than":60000.0,"count":11250.0},{"less_than":300000.0,"count":11250.0},{"less_than":600000.0,"count":11250.0},{"less_than":1800000.0,"count":11250.0},{"less_than":null,"count":11250.0}]
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_count : 11250.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_split_milliseconds_sum : 30.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_unmatched_rows_total : 3000000.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] query_merge_into_unmatched_rows_total : 0.0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:25Z INFO  test_replace_recluster]                            
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ======CLUSTERING INFO======
[2023-10-30T18:05:25Z INFO  test_replace_recluster] cluster_key : (to_yyyymmdd(insert_time), id)
[2023-10-30T18:05:25Z INFO  test_replace_recluster] block_count: 12
[2023-10-30T18:05:25Z INFO  test_replace_recluster] constant_block_count: 0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] unclustered_block_count: 0
[2023-10-30T18:05:25Z INFO  test_replace_recluster] average_overlaps: 5
[2023-10-30T18:05:25Z INFO  test_replace_recluster] average_depth: 4
[2023-10-30T18:05:25Z INFO  test_replace_recluster] block_depth_histogram: {"00004":12}
[2023-10-30T18:05:25Z INFO  test_replace_recluster] ===========================

passed 3000 times.

@JackTan25
Copy link
Contributor Author

JackTan25 commented Oct 31, 2023

cloud test

test> select count(*) from target_table;

SELECT
  count(*)
FROM
  target_table

-[ RECORD 1 ]-----------------------------------
count(*): 1200575805

1 row read in 0.757 sec. Processed 1 row, 1 B (1.32 row/s, 1 B/s)

test> select count(*) from source_table;

SELECT
  count(*)
FROM
  source_table

-[ RECORD 1 ]-----------------------------------
count(*): 500000

1 row read in 0.909 sec. Processed 1 row, 1 B (1.1 row/s, 1 B/s)

test> set enable_distributed_merge_into = 0;

SET
  enable_distributed_merge_into = 0

0 row read in 0.622 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

test> merge into target_table as t1 using (select * from source_table) as t2  on t1.l_partkey = t2.l_partkey and t1.l_orderkey = t2.l_orderkey and t1.l_suppkey = t2.l_suppkey and t1.l_linenumber = t2.l_linenumber when matched then update * when not matched then insert *;

MERGE INTO target_table AS t1 USING (
  SELECT
    *
  FROM
    source_table
) AS t2 ON t1.l_partkey = t2.l_partkey
AND t1.l_orderkey = t2.l_orderkey
AND t1.l_suppkey = t2.l_suppkey
AND t1.l_linenumber = t2.l_linenumber
WHEN matched THEN
UPDATE
  *
  WHEN NOT matched THEN
INSERT
  *

0 row read in 76.038 sec. Processed 1.2 billion row, 217.76 GiB (15.8 million row/s, 2.86 GiB/s)

test> set enable_distributed_merge_into = 1;

SET
  enable_distributed_merge_into = 1

0 row read in 0.663 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

test> merge into target_table as t1 using (select * from source_table) as t2  on t1.l_partkey = t2.l_partkey and t1.l_orderkey = t2.l_orderkey and t1.l_suppkey = t2.l_suppkey and t1.l_linenumber = t2.l_linenumber when matched then update * when not matched then insert *;

MERGE INTO target_table AS t1 USING (
  SELECT
    *
  FROM
    source_table
) AS t2 ON t1.l_partkey = t2.l_partkey
AND t1.l_orderkey = t2.l_orderkey
AND t1.l_suppkey = t2.l_suppkey
AND t1.l_linenumber = t2.l_linenumber
WHEN matched THEN
UPDATE
  *
  WHEN NOT matched THEN
INSERT
  *

0 row read in 40.058 sec. Processed 1.2 billion row, 217.76 GiB (29.98 million row/s, 5.44 GiB/s)

@JackTan25 JackTan25 requested a review from b41sh October 31, 2023 04:28
Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, trivial comments could be addressed in another pr to improve

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, trivial comments could be addressed in another pr to improve

@SkyFan2002 SkyFan2002 added this pull request to the merge queue Oct 31, 2023
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Oct 31, 2023
@BohuTANG BohuTANG merged commit 3c22f3f into datafuselabs:main Oct 31, 2023
63 checks passed
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* add settings

* right join for merge into first

* add distribution optimization for merge into join

* split merge into plan

* fix update identify error

* finish distibuted baisc codes

* fix typo

* uniform row_kind and mutation_log

* fix MixRowKindAndLog serialize and deserialize

* add tests

* fix check

* fix check

* fix check

* fix test

* fix test

* fix

* remove memory size limit

* optmizie merge source and add row_number processor

* fix delete bug

* add row number plan

* fix row number

* refactor merge into pipeline

* split row_number and log, try to get hash table source data

* finish distributed codes, need to get data from hashtable

* finish not macthed append data

* fix filter

* fix filter

* fix distributed bugs,many bugs, need to support insert

* fix bugs

* fix check and clean codes

* fix check

* add more tests

* fix flaky

* fix test result

* fix order

* clean codes

* remove local builder branch

* refactor logic

* clean codes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants