Feature/add listagg macro #530

graciegoheen · 2022-03-28T17:47:54Z

This is a:

bug fix PR with no breaking changes — please ensure the base branch is main
new functionality — please ensure the base branch is the latest dev/ branch
a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

Create a cross database compatible listagg macro.

Checklist

* Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md

* add quoting to split_part * update docs for split_part * typo * corrected readme syntax * revert and update to just documentation * add new line * Update README.md * Update README.md * Update README.md Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* add macro to get columns * star macro should use get_columns * add adapter. * swap adapter for dbt_utils Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * update documentation * add output_lower arg * update name to get_filtered_columns_in_relation from get_columns * add tests * forgot args * too much whitespace removal ----------- Actual: ----------- --->"field_3"as "test_field_3"<--- ----------- Expected: ----------- --->"field_3" as "test_field_3"<--- * didnt mean to move a file that i did not create. moving things back. * remove lowercase logic * limit_zero Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

graciegoheen · 2022-03-28T21:45:56Z

macros/cross_db_utils/listagg.sql

+{%- endmacro %}
+
+{# redshift should use default instead of postgres #}
+{% macro redshift__listagg(measure, delimiter_text, order_by_clause, limit_clause) -%}


Following the example in datediff macro - why is this necessary?

Good question! Some background:

The Redshift adapter inherits from Postgres, so will try using Postgres-specific code before reverting to the default (read more)

Postgres' datediff implementation is very limited and needs some ridiculous hacks.

Redshift actually has a good, sensible datediff implementation of its own instead of just behaving the same way as Postgres, so we're skipping the Postgres implementation in favour of the base implementation.

This is the right thing to do here as well - Redshift uses listagg, not string_agg, so we should use the default implementation instead of Postgres'.

joellabes

This is looking slick @graciegoheen! Thanks for getting the ball rolling so quickly.

There's a handful of ergonomic comments (and, ironically, a request for less documentation 😬 ).

I'd also love to see some more tests which exercise more of the permutations of arguments. For example, one that doesn't provide an order by or limit clause, one that uses distinct once it's added... you can really go down a rabbithole of permutations because they explode exponentially as you add more.

But at a minimum I would say that each argument should be exercised at least once, along with a super basic version that passes no optional values.

For the version without an order by clause, you might have to use a different seed which has multiple copies of the same column value all the way through so that you don't get caught up in the nondeterministic sorting; I don't know. Fortunately the integration tests will let you know if that's a problem!

README.md

integration_tests/data/cross_db/data_listagg.csv

joellabes · 2022-03-29T03:33:13Z

macros/cross_db_utils/listagg.sql

+{%- endmacro %}
+
+{# redshift should use default instead of postgres #}
+{% macro redshift__listagg(measure, delimiter_text, order_by_clause, limit_clause) -%}


Good question! Some background:

The Redshift adapter inherits from Postgres, so will try using Postgres-specific code before reverting to the default (read more)

Postgres' datediff implementation is very limited and needs some ridiculous hacks.

Redshift actually has a good, sensible datediff implementation of its own instead of just behaving the same way as Postgres, so we're skipping the Postgres implementation in favour of the base implementation.

This is the right thing to do here as well - Redshift uses listagg, not string_agg, so we should use the default implementation instead of Postgres'.

joellabes · 2022-03-29T03:34:02Z

macros/cross_db_utils/listagg.sql

@@ -0,0 +1,50 @@
+{% macro listagg(measure, delimiter_text, order_by_clause='', limit_clause='') -%}


Looks like support for DISTINCT is missing here

I believe DISTINCT is just part of the measure input. So if you wanted to use this macro with DISTINCT, you would write: {{ dbt_utils.listagg('DISTINCT string_text', "','", "order by order_col") }}.

joellabes · 2022-03-29T03:35:28Z

macros/cross_db_utils/listagg.sql

@@ -0,0 +1,50 @@
+{% macro listagg(measure, delimiter_text, order_by_clause='', limit_clause='') -%}
+  {{ return(adapter.dispatch('listagg', 'dbt_utils') (measure, delimiter_text, order_by_clause, limit_clause)) }}


For optional arguments (order_by, limit), it'd be a good idea to set them to none so that the end user doesn't have to add that in themselves on every invocation. Otherwise I think that the macro will complain that it has four arguments and only two have been provided.

Ah ok - out of curiosity, any reason none is better than an empty string here?

joellabes · 2022-03-29T03:43:53Z

macros/cross_db_utils/listagg.sql

+{%- endmacro %}
+
+
+{% macro default__listagg(measure, delimiter_text, order_by_clause, limit_clause) -%}


Not all of your implementations support limit_clause - for the cross-db macros it's important that they all do the same thing, because package maintainers use them to get consistent behaviour across all the different adapters.

There's a couple of options, depending on how committed you're feeling (either option is OK! This is why open source is great):

Remove the limit clause altogether - if and when someone else comes along who wants to add support, they can do it separately.

Look into a way to get equivalent behaviour by using other patterns. For example this sort of thing: https://stackoverflow.com/questions/27405150/postgresql-string-agg-with-limited-number-of-elements (you could stick with the native code unless a limit clause is provided, and then switch to the alternative approach).

alright! got this working (redshift was a beast) with the caveat that if there are instances of delimiter_text within your measure, you cannot include a limit_num

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

…tils into feature/add_listagg_macro

joellabes

The Redshift implementation makes my brain hurt, but it looks right and there's tests that validate its output so I'm on board! Good work 🥳

Last nitpicks I promise and then let's get this shipped! 🚢

joellabes · 2022-04-05T21:13:22Z

integration_tests/models/cross_db_utils/test_listagg.sql

+    select
+        group_col,
+        {{ dbt_utils.listagg('string_text', "','", "order by order_col") }} as actual,
+        1 as version


I like that all of these are in the one model, but I'm wondering if we can move away from magic numbers, and instead show what the goal of the version is. Something like comma_ordered, comma_ordered_limited, comma_unordered, distinct_comma, no_params...

Also, just noticed that all of these use , as their delimiter, which is also the default value, so again it'd be good to test it with something different to make sure that it's doing what we ask it to instead of being correct by accident.

It'd be particularly worthwhile to use a multi-character/whitespace-using delimiter for a bit of novelty, e.g. , or something weird like _|_

Love these ideas! Thanks for all of the feedback :)

Note " , " won't work as a delimiter if you have whitespace in your measure column. This is in-line with the note that "if there are instances of delimiter_text within your measure, you cannot include a limit_num".

graciegoheen · 2022-04-06T17:14:15Z

Working on making this handle special characters

joellabes · 2022-04-06T21:30:11Z

@graciegoheen incredible! I'm sorry this turned into such a messy contribution!

A gold star for you ⭐ ⭐ ⭐ ⭐ ⭐

The base branch was changed.

* Fix/timestamp withought timezone (#458) * timestamp and changelog updates * changelog fix * Add context for why change to no timezone Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * also ignore dbt_packages (#463) * also ignore dbt_packages * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * date_spine: transform comment to jinja (#462) * Have union_relations raise exception when include parameter results in no columns (#473) * Raise exception if no columns in column_superset * Add relation names to compiler error message * Add `union_relations` fix to changelog * Added case for handling postgres foreign tables... (#476) * Add link for fewer_rows_than schema test in docs (#465) * Added case for handling postgres foreign tables (tables which are external to current database and are imported into current database from remote data stores by using Foreign Data Wrappers functionallity). * Reworked getting of postges table_type. * Added needed changes to CHANGELOG. Co-authored-by: José Coto <jlcoto@users.noreply.github.com> Co-authored-by: Taras Stetsiak <tstetsiak@health-union.com> * Enhance usability of star macro by only generating column aliases when prefix and/or suffix is specified (#468) * The star macro should only produce column aliases when there is either a prefix or suffix specified. * Enhanced the readme for the star macro. * Add new integration test Co-authored-by: Nick Perrott <nperrott@roiti.com> Co-authored-by: Josh Elston-Green Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * fix: extra brace typo in insert_by_period_materialization (#480) * Support quoted column names in sequential_values test (#479) * Add any value (#501) * Add link for fewer_rows_than schema test in docs (#465) * Update get_query_results_as_dict example to demonstrate accessing columnar results as dictionary values (#474) * Update get_qu ery_results_as_dict example to demonstrate accessing columnar results as dictionary values * Use slugify in example * Fix slugify example with dbt_utils. package prefix Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com> * Add note about not_null_where deprecation to Readme (#477) * Add note about not_null_where deprecation to Readme * Add docs to unique_where test * Update pull_request_template.md to reference `main` vs `master` (#496) * Correct coalesce -> concatenation typo (#495) * add any_value cross-db macro * Missing colon in test * Update CHANGELOG.md Co-authored-by: José Coto <jlcoto@users.noreply.github.com> Co-authored-by: Elize Papineau <elizepapineau@gmail.com> Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com> Co-authored-by: Joe Ste.Marie <stemarie.joe@gmail.com> Co-authored-by: Niall Woodward <niall@niallrees.com> * Fix changelog * Second take at fixing pivot to allow single quotes (#503) * fix pivot : in pivoted column value, single quote must be escaped (on postgresql) else ex. syntax error near : when color = 'blue's' * patched expected * single quote escape : added dispatched version of the macro to support bigquery & snowflake * second backslash to escape in Jinja, change case of test file columns Let's see if other databases allow this * explicitly list columns to compare * different tests for snowflake and others * specific comparison seed * Don't quote identifiers for apostrophe, to avoid BQ and SF problems * Whitespace management for macros * Update CHANGELOG.md Co-authored-by: Marc Dutoo <marc.dutoo@gmail.com> * Add bool or cross db (#504) * Create bool_or cross-db func * Forgot a comma * Update CHANGELOG.md * Code review tweaks * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add _is_ephemeral test to get_column_values (#518) * Add _is_ephemeral test Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com> * Add deduplication macro (#512) * Update README.md * Mutually excl range examples in disclosure triangle * Fix union_relations error when no include/exclude provided * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add dedupe macro * Add test for dedupe macro * Add documentation to README * Add entry to CHANGELOG * Implement review * Typed materialized views as views (#525) * Typed materialized views as views * Update get_relations_by_pattern.sql * Moving fix from get_tables_by_pattern_sql reverting changes to this file to add a fix to the macro get_tables_by_pattern_sql * removing quoting from table_type removing quoting from table_type as this was causing an error when calling this macro within get_tables_by_pattern_sql * calling get_table_types_sql for materialized views calling get_table_types_sql macro to handle materialized views in sources. * Add `alias` argument to `deduplicate` macro (#526) * Add `alias` argument to `deduplicate * Test `alias` argument * Rename `alias` to `relation_alias` * Fix/use generic test naming style instead of schema test (#521) * Updated Rreferences to 'schema test' in README along with small improvements to test descriptions. Updates were also carried out in folder structure and integration README * Updated references to 'schema test' in Changelog * updated changelog with changes to documentation and fproject file structure * Apply suggestions from code review Update macro descriptions to be "asserts that" * Update CHANGELOG.md * Update README.md Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * Remove extraneous whitespace (#529) * rm whitespace from date_trunc * datediff * rm uncessary whitespace control * change log * fix CHANGELOG * address comments * Feature/add listagg macro (#530) * Update README.md * Mutually excl range examples in disclosure triangle * Fix union_relations error when no include/exclude provided * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add to_condition to relationships where * very minor nit - update "an new" to "a new" (#519) * add quoting to split_part (#528) * add quoting to split_part * update docs for split_part * typo * corrected readme syntax * revert and update to just documentation * add new line * Update README.md * Update README.md * Update README.md Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * add macro to get columns (#516) * add macro to get columns * star macro should use get_columns * add adapter. * swap adapter for dbt_utils Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * update documentation * add output_lower arg * update name to get_filtered_columns_in_relation from get_columns * add tests * forgot args * too much whitespace removal ----------- Actual: ----------- --->"field_3"as "test_field_3"<--- ----------- Expected: ----------- --->"field_3" as "test_field_3"<--- * didnt mean to move a file that i did not create. moving things back. * remove lowercase logic * limit_zero Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * Add listagg macro and integration test * remove type in listagg macro * updated integration test * Add redshift to listagg macro * remove redshift listagg * explicitly named group by column * updated default values * Updated example to use correct double vs. single quotes * whitespace control * Added redshift specific macro * Remove documentation * Update integration test so less likely to accidentally work Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> * default everything but measure to none * added limit functionality for other dbs * syntax bug for postgres * update redshift macro * fixed block def control * Fixed bug in redshift * Bug fix redshift * remove unused group_by arg * Added additional test without order by col * updated to regex replace * typo * added more integration_tests * attempt to make redshift less complicated * typo * update redshift * replace to substr * More explicit versions with added complexity * handle special characters Co-authored-by: Joel Labes <joel.labes@dbtlabs.com> Co-authored-by: Jamie Rosenberg <james.rosenberg@canva.com> Co-authored-by: Pat Kearns <pat.kearns@fishtownanalytics.com> * patch default behaviour in get_column_values (#533) * Update changelog, add missing quotes around get_table_types_sql * rm whitespace Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> Co-authored-by: Anders <swanson.anders@gmail.com> Co-authored-by: Mikaël Simarik <mikael.simarik@gmail.com> Co-authored-by: Graham Wetzler <graham@wetzler.dev> Co-authored-by: Taras <32882370+Aesthet@users.noreply.github.com> Co-authored-by: José Coto <jlcoto@users.noreply.github.com> Co-authored-by: Taras Stetsiak <tstetsiak@health-union.com> Co-authored-by: nickperrott <46330920+nickperrott@users.noreply.github.com> Co-authored-by: Nick Perrott <nperrott@roiti.com> Co-authored-by: Ted Conbeer <tconbeer@users.noreply.github.com> Co-authored-by: Armand Duijn <armandduijn@users.noreply.github.com> Co-authored-by: Elize Papineau <elizepapineau@gmail.com> Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com> Co-authored-by: Joe Ste.Marie <stemarie.joe@gmail.com> Co-authored-by: Niall Woodward <niall@niallrees.com> Co-authored-by: Marc Dutoo <marc.dutoo@gmail.com> Co-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com> Co-authored-by: Luis Leon <98919783+luisleon90@users.noreply.github.com> Co-authored-by: Brid Moynihan <moynihanbridgit@gmail.com> Co-authored-by: SunriseLong <44146580+SunriseLong@users.noreply.github.com> Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> Co-authored-by: Jamie Rosenberg <james.rosenberg@canva.com> Co-authored-by: Pat Kearns <pat.kearns@fishtownanalytics.com> Co-authored-by: James McNeill <55981540+jpmmcneill@users.noreply.github.com>

joellabes and others added 10 commits February 23, 2022 15:53

Update README.md

a7f4f51

Mutually excl range examples in disclosure triangle

168396f

Fix union_relations error when no include/exclude provided

5717b10

* Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md

Add to_condition to relationships where

3c83bf4

very minor nit - update "an new" to "a new" (#519)

b000d8b

Add listagg macro and integration test

a04bd8a

remove type in listagg macro

127203c

updated integration test

91dcdb7

graciegoheen requested a review from joellabes March 28, 2022 17:47

graciegoheen mentioned this pull request Mar 28, 2022

Fix/update grain of multiple sources joined dbt-labs/dbt-project-evaluator#64

Merged

graciegoheen added 7 commits March 28, 2022 13:54

Add redshift to listagg macro

08a2345

remove redshift listagg

5ad068c

explicitly named group by column

814dd3b

updated default values

365cf79

Updated example to use correct double vs. single quotes

0f19bb8

whitespace control

a0f71d1

Added redshift specific macro

e140371

graciegoheen commented Mar 28, 2022

View reviewed changes

joellabes requested changes Mar 29, 2022

View reviewed changes

graciegoheen and others added 9 commits March 30, 2022 10:28

Remove documentation

31cb2be

Update integration test so less likely to accidentally work

71db890

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

default everything but measure to none

c19343c

Merge branch 'feature/add_listagg_macro' of github.com:dbt-labs/dbt-u…

9f8e917

…tils into feature/add_listagg_macro

added limit functionality for other dbs

55c9a49

syntax bug for postgres

6c6fa8c

update redshift macro

270c123

fixed block def control

05812fb

Fixed bug in redshift

f79b9a1

graciegoheen added 6 commits March 30, 2022 14:36

Bug fix redshift

3ff594e

remove unused group_by arg

9cad420

Added additional test without order by col

2a3d30b

updated to regex replace

a8928ec

typo

f19727f

added more integration_tests

d7a5a1d

graciegoheen requested a review from joellabes March 30, 2022 19:40

graciegoheen added 4 commits March 30, 2022 16:39

attempt to make redshift less complicated

c7db217

typo

ca702e6

update redshift

5673f15

replace to substr

e1c5050

graciegoheen mentioned this pull request Apr 4, 2022

Updated listagg macro to dbt-utils once PR has been merged dbt-labs/dbt-project-evaluator#69

Closed

joellabes requested changes Apr 5, 2022

View reviewed changes

More explicit versions with added complexity

8c43858

graciegoheen requested a review from joellabes April 6, 2022 01:56

handle special characters

8787d84

joellabes previously approved these changes Apr 6, 2022

View reviewed changes

joellabes changed the base branch from main to next/patch April 6, 2022 21:30

Merge branch 'next/patch' into feature/add_listagg_macro

9e8b41f

joellabes approved these changes Apr 6, 2022

View reviewed changes

joellabes merged commit 1a517d2 into next/patch Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add listagg macro #530

Feature/add listagg macro #530

graciegoheen commented Mar 28, 2022

graciegoheen Mar 28, 2022

joellabes Mar 29, 2022

joellabes left a comment

joellabes Mar 29, 2022

joellabes Mar 29, 2022

graciegoheen Mar 30, 2022

joellabes Mar 29, 2022

graciegoheen Mar 30, 2022

joellabes Mar 29, 2022

graciegoheen Mar 30, 2022 •

edited

Loading

joellabes left a comment

joellabes Apr 5, 2022

graciegoheen Apr 6, 2022

graciegoheen Apr 6, 2022

graciegoheen commented Apr 6, 2022

joellabes commented Apr 6, 2022

		@@ -0,0 +1,50 @@
		{% macro listagg(measure, delimiter_text, order_by_clause='', limit_clause='') -%}

		@@ -0,0 +1,50 @@
		{% macro listagg(measure, delimiter_text, order_by_clause='', limit_clause='') -%}
		{{ return(adapter.dispatch('listagg', 'dbt_utils') (measure, delimiter_text, order_by_clause, limit_clause)) }}

		{%- endmacro %}


		{% macro default__listagg(measure, delimiter_text, order_by_clause, limit_clause) -%}

Feature/add listagg macro #530

Feature/add listagg macro #530

Conversation

graciegoheen commented Mar 28, 2022

Description & motivation

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joellabes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graciegoheen Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

joellabes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graciegoheen commented Apr 6, 2022

joellabes commented Apr 6, 2022

graciegoheen Mar 30, 2022 •

edited

Loading