Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add listagg macro #530

Merged
merged 39 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
a7f4f51
Update README.md
joellabes Feb 23, 2022
168396f
Mutually excl range examples in disclosure triangle
joellabes Feb 28, 2022
5717b10
Fix union_relations error when no include/exclude provided
joellabes Mar 3, 2022
3c83bf4
Add to_condition to relationships where
joellabes Mar 10, 2022
b000d8b
very minor nit - update "an new" to "a new" (#519)
JamieRosenberg-canva Mar 14, 2022
9e32d9c
add quoting to split_part (#528)
patkearns10 Mar 28, 2022
d279542
add macro to get columns (#516)
patkearns10 Mar 28, 2022
a04bd8a
Add listagg macro and integration test
graciegoheen Mar 28, 2022
127203c
remove type in listagg macro
graciegoheen Mar 28, 2022
91dcdb7
updated integration test
graciegoheen Mar 28, 2022
08a2345
Add redshift to listagg macro
graciegoheen Mar 28, 2022
5ad068c
remove redshift listagg
graciegoheen Mar 28, 2022
814dd3b
explicitly named group by column
graciegoheen Mar 28, 2022
365cf79
updated default values
graciegoheen Mar 28, 2022
0f19bb8
Updated example to use correct double vs. single quotes
graciegoheen Mar 28, 2022
a0f71d1
whitespace control
graciegoheen Mar 28, 2022
e140371
Added redshift specific macro
graciegoheen Mar 28, 2022
31cb2be
Remove documentation
graciegoheen Mar 30, 2022
71db890
Update integration test so less likely to accidentally work
graciegoheen Mar 30, 2022
c19343c
default everything but measure to none
graciegoheen Mar 30, 2022
9f8e917
Merge branch 'feature/add_listagg_macro' of github.com:dbt-labs/dbt-u…
graciegoheen Mar 30, 2022
55c9a49
added limit functionality for other dbs
graciegoheen Mar 30, 2022
6c6fa8c
syntax bug for postgres
graciegoheen Mar 30, 2022
270c123
update redshift macro
graciegoheen Mar 30, 2022
05812fb
fixed block def control
graciegoheen Mar 30, 2022
f79b9a1
Fixed bug in redshift
graciegoheen Mar 30, 2022
3ff594e
Bug fix redshift
graciegoheen Mar 30, 2022
9cad420
remove unused group_by arg
graciegoheen Mar 30, 2022
2a3d30b
Added additional test without order by col
graciegoheen Mar 30, 2022
a8928ec
updated to regex replace
graciegoheen Mar 30, 2022
f19727f
typo
graciegoheen Mar 30, 2022
d7a5a1d
added more integration_tests
graciegoheen Mar 30, 2022
c7db217
attempt to make redshift less complicated
graciegoheen Mar 30, 2022
ca702e6
typo
graciegoheen Mar 30, 2022
5673f15
update redshift
graciegoheen Mar 30, 2022
e1c5050
replace to substr
graciegoheen Mar 30, 2022
8c43858
More explicit versions with added complexity
graciegoheen Apr 6, 2022
8787d84
handle special characters
graciegoheen Apr 6, 2022
9e8b41f
Merge branch 'next/patch' into feature/add_listagg_macro
joellabes Apr 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# dbt-utils v0.8.2
## Fixes
- Fix union_relations error from [#473](https://github.com/dbt-labs/dbt-utils/pull/473) when no include/exclude parameters are provided ([#505](https://github.com/dbt-labs/dbt-utils/issues/505), [#509](https://github.com/dbt-labs/dbt-utils/pull/509))

# dbt-utils v0.8.1

## New features
Expand Down
140 changes: 97 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this

- [Introspective macros](#introspective-macros):
- [get_column_values](#get_column_values-source)
- [get_filtered_columns_in_relation](#get_filtered_columns_in_relation-source)
- [get_relations_by_pattern](#get_relations_by_pattern-source)
- [get_relations_by_prefix](#get_relations_by_prefix-source)
- [get_query_results_as_dict](#get_query_results_as_dict-source)
Expand Down Expand Up @@ -58,6 +59,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
- [split_part](#split_part-source)
- [last_day](#last_day-source)
- [width_bucket](#width_bucket-source)
- [listagg](#listagg)

- [Jinja Helpers](#jinja-helpers)
- [pretty_time](#pretty_time-source)
Expand All @@ -67,7 +69,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
[Materializations](#materializations):
- [insert_by_period](#insert_by_period-source)

---
----
### Schema Tests
#### equal_rowcount ([source](macros/schema_tests/equal_rowcount.sql))
This schema test asserts the that two relations have the same number of rows.
Expand Down Expand Up @@ -310,6 +312,7 @@ models:
to: ref('other_model_name')
field: client_id
from_condition: id <> '4ca448b8-24bf-4b88-96c6-b1609499c38b'
to_condition: created_date >= '2020-01-01'
```

#### mutually_exclusive_ranges ([source](macros/schema_tests/mutually_exclusive_ranges.sql))
Expand Down Expand Up @@ -377,53 +380,58 @@ models:
partition_by: customer_id
gaps: allowed
```
<details>
<summary>Additional `gaps` and `zero_length_range_allowed` examples</summary>

**Understanding the `gaps` argument:**

**Understanding the `gaps` argument:**
Here are a number of examples for each allowed `gaps` argument.
* `gaps: not_allowed`: The upper bound of one record must be the lower bound of
the next record.
Here are a number of examples for each allowed `gaps` argument.
* `gaps: not_allowed`: The upper bound of one record must be the lower bound of
the next record.

| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |

* `gaps: allowed` (default): There may be a gap between the upper bound of one
record and the lower bound of the next record.
* `gaps: allowed` (default): There may be a gap between the upper bound of one
record and the lower bound of the next record.

| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 3 |
| 3 | 4 |
| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 3 |
| 3 | 4 |

* `gaps: required`: There must be a gap between the upper bound of one record and
the lower bound of the next record (common for date ranges).
* `gaps: required`: There must be a gap between the upper bound of one record and
the lower bound of the next record (common for date ranges).

| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 3 |
| 4 | 5 |
| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 3 |
| 4 | 5 |

**Understanding the `zero_length_range_allowed` argument:**
Here are a number of examples for each allowed `zero_length_range_allowed` argument.
* `zero_length_range_allowed: false`: (default) The upper bound of each record must be greater than its lower bound.
**Understanding the `zero_length_range_allowed` argument:**
Here are a number of examples for each allowed `zero_length_range_allowed` argument.
* `zero_length_range_allowed: false`: (default) The upper bound of each record must be greater than its lower bound.

| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |

* `zero_length_range_allowed: true`: The upper bound of each record can be greater than or equal to its lower bound.
* `zero_length_range_allowed: true`: The upper bound of each record can be greater than or equal to its lower bound.

| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 2 |
| 3 | 4 |
| lower_bound | upper_bound |
|-------------|-------------|
| 0 | 1 |
| 2 | 2 |
| 3 | 4 |

</details>

#### sequential_values ([source](macros/schema_tests/sequential_values.sql))
This test confirms that a column contains sequential values. It can be used
Expand Down Expand Up @@ -538,7 +546,7 @@ These macros run a query and return the results of the query as objects. They ar
#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
**Args:**
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
Expand Down Expand Up @@ -579,6 +587,28 @@ Arguments:
...
```

#### get_filtered_columns_in_relation ([source](macros/sql/get_filtered_columns_in_relation.sql))
This macro returns an iterable Jinja list of columns for a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation), (i.e. not from a CTE)
- optionally exclude columns
- the input values are not case-sensitive (input uppercase or lowercase and it will work!)
> Note: The native [`adapter.get_columns_in_relation` macro](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_columns_in_relation) allows you
to pull column names in a non-filtered fashion, also bringing along with it other (potentially unwanted) information, such as dtype, char_size, numeric_precision, etc.

**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)

**Usage:**
```sql
-- Returns a list of the columns from a relation, so you can then iterate in a for loop
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from=ref('your_model'), except=["field_1", "field_2"]) %}
...
{% for column_name in column_names %}
max({{ column_name }}) ... as max_'{{ column_name }}',
{% endfor %}
...
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.
Expand Down Expand Up @@ -742,9 +772,19 @@ group by 1,2,3
```

#### star ([source](macros/sql/star.sql))
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with the star macro. This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields
listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with
the star macro.
This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias
in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.

The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.
**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)
- `relation_alias` (optional, default=`''`): will prefix all generated fields with an alias (`relation_alias`.`field_name`).
- `prefix` (optional, default=`''`): will prefix the output `field_name` (`field_name as prefix_field_name`).
- `suffix` (optional, default=`''`): will suffix the output `field_name` (`field_name as field_name_suffix`).

**Usage:**
```sql
Expand All @@ -761,12 +801,19 @@ from {{ ref('my_model') }}

```

```sql
select
{{ dbt_utils.star(from=ref('my_model'), except=["exclude_field_1", "exclude_field_2"], prefix="max_") }}
from {{ ref('my_model') }}

```

#### union_relations ([source](macros/sql/union.sql))

This macro unions together an array of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation),
even when columns have differing orders in each Relation, and/or some columns are
missing from some relations. Any columns exclusive to a subset of these
relations will be filled with `null` where not present. An new column
relations will be filled with `null` where not present. A new column
(`_dbt_source_relation`) is also added to indicate the source for each record.

**Usage:**
Expand Down Expand Up @@ -959,9 +1006,16 @@ This macro calculates the difference between two dates.
#### split_part ([source](macros/cross_db_utils/split_part.sql))
This macro splits a string of text using the supplied delimiter and returns the supplied part number (1-indexed).

**Args**:
- `string_text` (required): Text to be split into parts.
- `delimiter_text` (required): Text representing the delimiter to split by.
- `part_number` (required): Requested part of the split (1-based). If the value is negative, the parts are counted backward from the end of the string.

**Usage:**
When referencing a column, use one pair of quotes. When referencing a string, use single quotes enclosed in double quotes.
```
{{ dbt_utils.split_part(string_text='1,2,3', delimiter_text=',', part_number=1) }}
{{ dbt_utils.split_part(string_text='column_to_split', delimiter_text='delimiter_column', part_number=1) }}
{{ dbt_utils.split_part(string_text="'1|2|3'", delimiter_text="'|'", part_number=1) }}
```

#### date_trunc ([source](macros/cross_db_utils/date_trunc.sql))
Expand Down
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,string_text,order_col
1,a,1
1,b,2
1,c,3
2,a,2
2,1,1
2,p,3
3,g,1
3,g,2
3,g,3
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg_output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,expected,version
1,"a,b,c",1
2,"1,a,p",1
3,"g,g,g",1
1,"a,b",2
2,"1,a",2
3,"g,g",2
3,"g,g,g",3
3,"g",4
3,"g,g,g",5
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
field_1,field_2,field_3
a,b,c
d,e,f
g,h,i
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
field_2,field_3
h,i
32 changes: 32 additions & 0 deletions integration_tests/macros/assert_equal_values.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% macro assert_equal_values(actual_object, expected_object) %}
{% if not execute %}

{# pass #}

{% elif actual_object != expected_object %}

{% set msg %}
Expected did not match actual

-----------
Actual:
-----------
--->{{ actual_object }}<---

-----------
Expected:
-----------
--->{{ expected_object }}<---

{% endset %}

{{ log(msg, info=True) }}

select 'fail'

{% else %}

select 'ok' {{ limit_zero() }}

{% endif %}
{% endmacro %}
6 changes: 6 additions & 0 deletions integration_tests/models/cross_db_utils/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ models:
- assert_equal:
actual: actual
expected: expected

- name: test_listagg
tests:
- assert_equal:
actual: actual
expected: expected

- name: test_safe_cast
tests:
Expand Down
69 changes: 69 additions & 0 deletions integration_tests/models/cross_db_utils/test_listagg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
with data as (

select * from {{ ref('data_listagg') }}

),

data_output as (

select * from {{ ref('data_listagg_output') }}

),

calculate as (

select
group_col,
{{ dbt_utils.listagg('string_text', "','", "order by order_col") }} as actual,
1 as version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that all of these are in the one model, but I'm wondering if we can move away from magic numbers, and instead show what the goal of the version is. Something like comma_ordered, comma_ordered_limited, comma_unordered, distinct_comma, no_params...

Also, just noticed that all of these use , as their delimiter, which is also the default value, so again it'd be good to test it with something different to make sure that it's doing what we ask it to instead of being correct by accident.

It'd be particularly worthwhile to use a multi-character/whitespace-using delimiter for a bit of novelty, e.g. , or something weird like _|_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these ideas! Thanks for all of the feedback :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note " , " won't work as a delimiter if you have whitespace in your measure column. This is in-line with the note that "if there are instances of delimiter_text within your measure, you cannot include a limit_num".

from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "','", "order by order_col", 2) }} as actual,
2 as version
from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "','") }} as actual,
3 as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('DISTINCT string_text', "','") }} as actual,
4 as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text') }} as actual,
5 as version
from data
where group_col = 3
group by group_col

)

select
calculate.actual,
data_output.expected
from calculate
left join data_output
on calculate.group_col = data_output.group_col
and calculate.version = data_output.version
Loading