Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add listagg macro #530

Merged
merged 39 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
a7f4f51
Update README.md
joellabes Feb 23, 2022
168396f
Mutually excl range examples in disclosure triangle
joellabes Feb 28, 2022
5717b10
Fix union_relations error when no include/exclude provided
joellabes Mar 3, 2022
3c83bf4
Add to_condition to relationships where
joellabes Mar 10, 2022
b000d8b
very minor nit - update "an new" to "a new" (#519)
JamieRosenberg-canva Mar 14, 2022
9e32d9c
add quoting to split_part (#528)
patkearns10 Mar 28, 2022
d279542
add macro to get columns (#516)
patkearns10 Mar 28, 2022
a04bd8a
Add listagg macro and integration test
graciegoheen Mar 28, 2022
127203c
remove type in listagg macro
graciegoheen Mar 28, 2022
91dcdb7
updated integration test
graciegoheen Mar 28, 2022
08a2345
Add redshift to listagg macro
graciegoheen Mar 28, 2022
5ad068c
remove redshift listagg
graciegoheen Mar 28, 2022
814dd3b
explicitly named group by column
graciegoheen Mar 28, 2022
365cf79
updated default values
graciegoheen Mar 28, 2022
0f19bb8
Updated example to use correct double vs. single quotes
graciegoheen Mar 28, 2022
a0f71d1
whitespace control
graciegoheen Mar 28, 2022
e140371
Added redshift specific macro
graciegoheen Mar 28, 2022
31cb2be
Remove documentation
graciegoheen Mar 30, 2022
71db890
Update integration test so less likely to accidentally work
graciegoheen Mar 30, 2022
c19343c
default everything but measure to none
graciegoheen Mar 30, 2022
9f8e917
Merge branch 'feature/add_listagg_macro' of github.com:dbt-labs/dbt-u…
graciegoheen Mar 30, 2022
55c9a49
added limit functionality for other dbs
graciegoheen Mar 30, 2022
6c6fa8c
syntax bug for postgres
graciegoheen Mar 30, 2022
270c123
update redshift macro
graciegoheen Mar 30, 2022
05812fb
fixed block def control
graciegoheen Mar 30, 2022
f79b9a1
Fixed bug in redshift
graciegoheen Mar 30, 2022
3ff594e
Bug fix redshift
graciegoheen Mar 30, 2022
9cad420
remove unused group_by arg
graciegoheen Mar 30, 2022
2a3d30b
Added additional test without order by col
graciegoheen Mar 30, 2022
a8928ec
updated to regex replace
graciegoheen Mar 30, 2022
f19727f
typo
graciegoheen Mar 30, 2022
d7a5a1d
added more integration_tests
graciegoheen Mar 30, 2022
c7db217
attempt to make redshift less complicated
graciegoheen Mar 30, 2022
ca702e6
typo
graciegoheen Mar 30, 2022
5673f15
update redshift
graciegoheen Mar 30, 2022
e1c5050
replace to substr
graciegoheen Mar 30, 2022
8c43858
More explicit versions with added complexity
graciegoheen Apr 6, 2022
8787d84
handle special characters
graciegoheen Apr 6, 2022
9e8b41f
Merge branch 'next/patch' into feature/add_listagg_macro
joellabes Apr 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# dbt-utils v0.8.3
## New features
- A macro for deduplicating data ([#335](https://github.com/dbt-labs/dbt-utils/issues/335), [#512](https://github.com/dbt-labs/dbt-utils/pull/512))
Expand Down
61 changes: 54 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this

- [Introspective macros](#introspective-macros):
- [get_column_values](#get_column_values-source)
- [get_filtered_columns_in_relation](#get_filtered_columns_in_relation-source)
- [get_relations_by_pattern](#get_relations_by_pattern-source)
- [get_relations_by_prefix](#get_relations_by_prefix-source)
- [get_query_results_as_dict](#get_query_results_as_dict-source)
Expand Down Expand Up @@ -59,6 +60,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
- [split_part](#split_part-source)
- [last_day](#last_day-source)
- [width_bucket](#width_bucket-source)
- [listagg](#listagg)

- [Jinja Helpers](#jinja-helpers)
- [pretty_time](#pretty_time-source)
Expand All @@ -69,11 +71,11 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
- [insert_by_period](#insert_by_period-source)

----
=======
### Generic Tests
#### equal_rowcount ([source](macros/generic_tests/equal_rowcount.sql))
Asserts that two relations have the same number of rows.


**Usage:**
```yaml
version: 2
Expand Down Expand Up @@ -387,7 +389,6 @@ models:
```
<details>
<summary>Additional `gaps` and `zero_length_range_allowed` examples</summary>

**Understanding the `gaps` argument:**

Here are a number of examples for each allowed `gaps` argument.
Expand Down Expand Up @@ -435,7 +436,6 @@ models:
| 0 | 1 |
| 2 | 2 |
| 3 | 4 |

</details>

#### sequential_values ([source](macros/generic_tests/sequential_values.sql))
Expand Down Expand Up @@ -551,7 +551,7 @@ These macros run a query and return the results of the query as objects. They ar
#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
**Args:**
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
Expand Down Expand Up @@ -592,6 +592,28 @@ Arguments:
...
```

#### get_filtered_columns_in_relation ([source](macros/sql/get_filtered_columns_in_relation.sql))
This macro returns an iterable Jinja list of columns for a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation), (i.e. not from a CTE)
- optionally exclude columns
- the input values are not case-sensitive (input uppercase or lowercase and it will work!)
> Note: The native [`adapter.get_columns_in_relation` macro](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_columns_in_relation) allows you
to pull column names in a non-filtered fashion, also bringing along with it other (potentially unwanted) information, such as dtype, char_size, numeric_precision, etc.

**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)

**Usage:**
```sql
-- Returns a list of the columns from a relation, so you can then iterate in a for loop
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from=ref('your_model'), except=["field_1", "field_2"]) %}
...
{% for column_name in column_names %}
max({{ column_name }}) ... as max_'{{ column_name }}',
{% endfor %}
...
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.
Expand Down Expand Up @@ -770,9 +792,20 @@ group by 1,2,3
```

#### star ([source](macros/sql/star.sql))
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with the star macro. This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields
listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with
the star macro.
This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias
in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.


The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.
**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)
- `relation_alias` (optional, default=`''`): will prefix all generated fields with an alias (`relation_alias`.`field_name`).
- `prefix` (optional, default=`''`): will prefix the output `field_name` (`field_name as prefix_field_name`).
- `suffix` (optional, default=`''`): will suffix the output `field_name` (`field_name as field_name_suffix`).

**Usage:**
```sql
Expand All @@ -789,6 +822,13 @@ from {{ ref('my_model') }}

```

```sql
select
{{ dbt_utils.star(from=ref('my_model'), except=["exclude_field_1", "exclude_field_2"], prefix="max_") }}
from {{ ref('my_model') }}

```

#### union_relations ([source](macros/sql/union.sql))

This macro unions together an array of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation),
Expand Down Expand Up @@ -987,9 +1027,16 @@ This macro calculates the difference between two dates.
#### split_part ([source](macros/cross_db_utils/split_part.sql))
This macro splits a string of text using the supplied delimiter and returns the supplied part number (1-indexed).

**Args**:
- `string_text` (required): Text to be split into parts.
- `delimiter_text` (required): Text representing the delimiter to split by.
- `part_number` (required): Requested part of the split (1-based). If the value is negative, the parts are counted backward from the end of the string.

**Usage:**
When referencing a column, use one pair of quotes. When referencing a string, use single quotes enclosed in double quotes.
```
{{ dbt_utils.split_part(string_text='1,2,3', delimiter_text=',', part_number=1) }}
{{ dbt_utils.split_part(string_text='column_to_split', delimiter_text='delimiter_column', part_number=1) }}
{{ dbt_utils.split_part(string_text="'1|2|3'", delimiter_text="'|'", part_number=1) }}
```

#### date_trunc ([source](macros/cross_db_utils/date_trunc.sql))
Expand Down
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,string_text,order_col
1,a,1
1,b,2
1,c,3
2,a,2
2,1,1
2,p,3
3,g,1
3,g,2
3,g,3
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg_output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,expected,version
1,"a_|_b_|_c",bottom_ordered
2,"1_|_a_|_p",bottom_ordered
3,"g_|_g_|_g",bottom_ordered
1,"a_|_b",bottom_ordered_limited
2,"1_|_a",bottom_ordered_limited
3,"g_|_g",bottom_ordered_limited
3,"g, g, g",comma_whitespace_unordered
3,"g",distinct_comma
3,"g,g,g",no_params
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
field_1,field_2,field_3
a,b,c
d,e,f
g,h,i
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
field_2,field_3
h,i
32 changes: 32 additions & 0 deletions integration_tests/macros/assert_equal_values.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% macro assert_equal_values(actual_object, expected_object) %}
{% if not execute %}

{# pass #}

{% elif actual_object != expected_object %}

{% set msg %}
Expected did not match actual

-----------
Actual:
-----------
--->{{ actual_object }}<---

-----------
Expected:
-----------
--->{{ expected_object }}<---

{% endset %}

{{ log(msg, info=True) }}

select 'fail'

{% else %}

select 'ok' {{ limit_zero() }}

{% endif %}
{% endmacro %}
6 changes: 6 additions & 0 deletions integration_tests/models/cross_db_utils/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ models:
- assert_equal:
actual: actual
expected: expected

- name: test_listagg
tests:
- assert_equal:
actual: actual
expected: expected

- name: test_safe_cast
tests:
Expand Down
69 changes: 69 additions & 0 deletions integration_tests/models/cross_db_utils/test_listagg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
with data as (

select * from {{ ref('data_listagg') }}

),

data_output as (

select * from {{ ref('data_listagg_output') }}

),

calculate as (

select
group_col,
{{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col") }} as actual,
'bottom_ordered' as version
from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col", 2) }} as actual,
'bottom_ordered_limited' as version
from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "', '") }} as actual,
'comma_whitespace_unordered' as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('DISTINCT string_text', "','") }} as actual,
'distinct_comma' as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text') }} as actual,
'no_params' as version
from data
where group_col = 3
group by group_col

)

select
calculate.actual,
data_output.expected
from calculate
left join data_output
on calculate.group_col = data_output.group_col
and calculate.version = data_output.version
10 changes: 10 additions & 0 deletions integration_tests/models/sql/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ models:
values:
- '5'

- name: test_get_filtered_columns_in_relation
tests:
- dbt_utils.equality:
compare_model: ref('data_filtered_columns_in_relation_expected')

- name: test_get_relations_by_prefix_and_union
columns:
- name: event
Expand Down Expand Up @@ -121,6 +126,11 @@ models:
- dbt_utils.equality:
compare_model: ref('data_star_aggregate_expected')

- name: test_star_uppercase
tests:
- dbt_utils.equality:
compare_model: ref('data_star_expected')

- name: test_surrogate_key
tests:
- assert_equal:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{% set exclude_field = 'field_1' %}
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from= ref('data_filtered_columns_in_relation'), except=[exclude_field]) %}

with data as (

select

{% for column_name in column_names %}
max({{ column_name }}) as {{ column_name }} {% if not loop.last %},{% endif %}
{% endfor %}

from {{ ref('data_filtered_columns_in_relation') }}

)

select * from data
13 changes: 13 additions & 0 deletions integration_tests/models/sql/test_star_uppercase.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% set exclude_field = 'FIELD_3' %}


with data as (

select
{{ dbt_utils.star(from=ref('data_star'), except=[exclude_field]) }}

from {{ ref('data_star') }}

)

select * from data
Loading