Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOTALS row disappears in some queries #10674

Closed
vanzi opened this issue May 5, 2020 · 1 comment · Fixed by #10698
Closed

TOTALS row disappears in some queries #10674

vanzi opened this issue May 5, 2020 · 1 comment · Fixed by #10698
Assignees
Labels
bug Confirmed user-visible misbehaviour in official release

Comments

@vanzi
Copy link

vanzi commented May 5, 2020

Describe the bug
After upgrading from 19.6.2.11 to 20.3.7.46, I've noticed that TOTALS row disappeared from some queries where it was previously available. I could not determine what is causing this, but it looks like a bug (or at least some inconsistency).

How to reproduce

  • Which ClickHouse server version to use

20.3.7.46

  • Which interface to use, if matters

native clickhouse-client

  • Non-default settings, if any

default single server setup from docker

  • CREATE TABLE statements for all tables involved
CREATE TABLE foo (server_date Date, dimension_1 String, metric_1 UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(server_date) ORDER BY (server_date);
CREATE TABLE bar (server_date Date, dimension_1 String, metric_2 UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(server_date) ORDER BY (server_date);
INSERT INTO foo VALUES ('2020-01-01', 'test1', 10), ('2020-01-01', 'test2', 20);
INSERT INTO bar VALUES ('2020-01-01', 'test2', 30), ('2020-01-01', 'test3', 40);
  • Queries to run that lead to unexpected result

Query without any filters, TOTALS row is available:

SELECT 
    dimension_1, 
    sum_metric_1, 
    sum_metric_2
FROM 
(
    SELECT 
        dimension_1, 
        sum(metric_1) AS sum_metric_1
    FROM foo
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_1
ALL FULL OUTER JOIN 
(
    SELECT 
        dimension_1, 
        sum(metric_2) AS sum_metric_2
    FROM bar
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_2 USING (dimension_1)
ORDER BY dimension_1 ASC

┌─dimension_1─┬─sum_metric_1─┬─sum_metric_2─┐
│ test1       │           100 │
│ test2       │           2030 │
│ test3       │            040 │
└─────────────┴──────────────┴──────────────┘

Totals:
┌─dimension_1─┬─sum_metric_1─┬─sum_metric_2─┐
│             │           3070 │
└─────────────┴──────────────┴──────────────┘

2 rows in set. Elapsed: 0.004 sec. 

Another similar query with sum_metric_2 > 20 filter, TOTALS row is still available:

SELECT 
    dimension_1, 
    sum_metric_1, 
    sum_metric_2
FROM 
(
    SELECT 
        dimension_1, 
        sum(metric_1) AS sum_metric_1
    FROM foo
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_1
ALL FULL OUTER JOIN 
(
    SELECT 
        dimension_1, 
        sum(metric_2) AS sum_metric_2
    FROM bar
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_2 USING (dimension_1)
WHERE sum_metric_2 > 20
ORDER BY dimension_1 ASC

┌─dimension_1─┬─sum_metric_1─┬─sum_metric_2─┐
│ test2       │           2030 │
│ test3       │            040 │
└─────────────┴──────────────┴──────────────┘

Totals:
┌─dimension_1─┬─sum_metric_1─┬─sum_metric_2─┐
│             │           3070 │
└─────────────┴──────────────┴──────────────┘

2 rows in set. Elapsed: 0.004 sec. 

And finally the exact same query, but with a sum_metric_2 < 20 filter, TOTALS row disappears 😕

SELECT 
    dimension_1, 
    sum_metric_1, 
    sum_metric_2
FROM 
(
    SELECT 
        dimension_1, 
        sum(metric_1) AS sum_metric_1
    FROM foo
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_1
ALL FULL OUTER JOIN 
(
    SELECT 
        dimension_1, 
        sum(metric_2) AS sum_metric_2
    FROM bar
    GROUP BY dimension_1
        WITH TOTALS
) AS subquery_2 USING (dimension_1)
WHERE sum_metric_2 < 20
ORDER BY dimension_1 ASC

┌─dimension_1─┬─sum_metric_1─┬─sum_metric_2─┐
│ test1       │           100 │
└─────────────┴──────────────┴──────────────┘

1 rows in set. Elapsed: 0.004 sec.

Is there some kind of workaround to get the totals row in the last query?

@vanzi vanzi added the bug Confirmed user-visible misbehaviour in official release label May 5, 2020
@4ertus2
Copy link
Contributor

4ertus2 commented May 6, 2020

20.3.7 already has some related patch e8cec76
Looks like it's not enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants