Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Support aggregation spill for the second stage #3033

Closed
zhanglistar opened this issue Sep 6, 2023 · 0 comments
Closed

[CH] Support aggregation spill for the second stage #3033

zhanglistar opened this issue Sep 6, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request stale stale

Comments

@zhanglistar
Copy link
Contributor

Description

Now, clickhouse does not support spill on the second stage of aggregation which maybe result a OOM.
We have encounter this on the following SQL:

INSERT OVERWRITE TABLE tmp.test_zzb_gluten_d_9547_0_1
SELECT user_a, user_b, SUM(type) as indicator FROM (
SELECT uid as user_a, follow_uid as user_b, 1 as type FROM log.tbl_show_follow_orc WHERE deleted=0 GROUP BY uid, follow_uid
UNION ALL
SELECT follow_uid as user_a, uid as user_b, 2 as type FROM log.tbl_show_follow_orc WHERE deleted=0 GROUP BY uid, follow_uid
) unionResult
GROUP BY user_a, user_b
HAVING indicator=3

Got the ch plan:

digraph
{
  rankdir="LR";
  { node [shape = rect]
    subgraph cluster_0 {
      label ="MergingAggregated";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n3 [label="MergingAggregatedTransform"];
      }
    }
    subgraph cluster_1 {
      label ="ReadFromPreparedSource";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n1 [label="SourceFromJavaIter"];
      }
    }
    subgraph cluster_2 {
      label ="BlocksBufferPoolStep";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n2 [label="BlocksBufferPoolTransform"];
      }
    }
    subgraph cluster_3 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n6 [label="ExpressionTransform"];
      }
    }
    subgraph cluster_4 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n4 [label="ExpressionTransform"];
      }
    }
    subgraph cluster_5 {
      label ="Filter";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n5 [label="FilterTransform"];
      }
    }
    subgraph cluster_6 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n7 [label="ExpressionTransform"];
      }
    }
    subgraph cluster_7 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n8 [label="ExpressionTransform"];
      }
    }
  }
  n3 -> n4 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))"];
  n1 -> n2 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum AggregateFunction(sum, Nullable(Int64)) AggregateFunction(size = 0)"];
  n2 -> n3 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum AggregateFunction(sum, Nullable(Int64)) AggregateFunction(size = 0)"];
  n6 -> n7 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum Int64 Int64(size = 0)"];
  n4 -> n5 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))"];
  n5 -> n6 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
sum#45#Partial#sum Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))"];
  n7 -> n8 [label="
user_a#22 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
user_b#23 Nullable(Int64) Nullable(size = 0, Int64(size = 0), UInt8(size = 0))
CAST(sum#45#Partial#sum,Nullable(Int32)_1) Nullable(Int32) Nullable(size = 0, Int32(size = 0), UInt8(size = 0))"];
}

The MergingAggregatedTransform indicates that we are in the second stage of AGG. And finally we got OOM.
image

@zhanglistar zhanglistar added the enhancement New feature or request label Sep 6, 2023
@baibaichen baibaichen self-assigned this Sep 15, 2023
@github-actions github-actions bot added the stale stale label Nov 17, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale stale
Projects
None yet
Development

No branches or pull requests

2 participants