Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(book): documentation for window rows vs range #3634

Merged
merged 4 commits into from
Oct 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 49 additions & 3 deletions web/book/src/reference/stdlib/transforms/window.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,11 @@ one of:
The bounds of the range are inclusive. If a bound is omitted, the segment will
extend until the edge of the table or group.

<!-- TODO: rows vs range example, with visualization -->

For ease of use, there are two flags that override `rows` or `range`:

- `expanding:true` is an alias for `rows:..0`. A sum using this window is also
known as "cumulative sum".
- `rolling:n` is an alias for `row:(-n+1)..0`, where `n` is an integer. This
- `rolling:n` is an alias for `rows:(-n+1)..0`, where `n` is an integer. This
will include `n` last values, including current row. An average using this
window is also knows as a Simple Moving Average.

Expand Down Expand Up @@ -66,6 +64,54 @@ group {order_month} (
)
```

Rows vs Range:

```prql
from [
{time_id=1, value=15},
{time_id=2, value=11},
{time_id=3, value=16},
{time_id=4, value=9},
{time_id=7, value=20},
{time_id=8, value=22},
]
window rows:-2..0 (
sort time_id
derive {sma3rows = average value}
)
window range:-2..0 (
sort time_id
derive {sma3range = average value}
)
```

| time_id | value | sma3rows | sma3range |
| ------- | ----- | -------- | --------- |
| 1 | 15 | 15 | 15 |
| 2 | 11 | 13 | 13 |
| 3 | 16 | 14 | 14 |
| 4 | 9 | 12 | 12 |
| 7 | 20 | 15 | 20 |
| 8 | 22 | 17 | 21 |

We can see that rows having `time_id` of 5 and 6 are missing in example data; we
can say there are gaps in our time series data.

When computing SMA 3 for the fifth row (`time_id==7`) then:

- "rows" will compute average on 3 rows (`time_id` in `3, 4, 7`)
- "range" will compute average on single row only (`time_id==7`)

When computing SMA 3 for the sixth row (`time_id==8`) then:

- "rows" will compute average on 3 rows (`time_id` in `4, 7, 8`)
- "range" will compute average on 2 rows (`time_id` in `7, 8`)

We can observe that "rows" ignores the content of the `time_id`, only uses its
order; we can say its window operates on physical rows. On the other hand
"range" looks at the content of the `time_id` and based on the content decides
how many rows fits into window; we can say window operates on logical rows.

## Windowing by default

If you use window functions without `window` transform, they will be applied to
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
source: web/book/tests/documentation/book.rs
expression: "from [\n {time_id=1, value=15},\n {time_id=2, value=11},\n {time_id=3, value=16},\n {time_id=4, value=9},\n {time_id=7, value=20},\n {time_id=8, value=22},\n]\nwindow rows:-2..0 (\n sort time_id\n derive {sma3rows = average value}\n)\nwindow range:-2..0 (\n sort time_id\n derive {sma3range = average value}\n)\n"
---
WITH table_0 AS (
SELECT
1 AS time_id,
15 AS value
UNION
ALL
SELECT
2 AS time_id,
11 AS value
UNION
ALL
SELECT
3 AS time_id,
16 AS value
UNION
ALL
SELECT
4 AS time_id,
9 AS value
UNION
ALL
SELECT
7 AS time_id,
20 AS value
UNION
ALL
SELECT
8 AS time_id,
22 AS value
)
SELECT
time_id,
value,
AVG(value) OVER (
ORDER BY
time_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3rows,
AVG(value) OVER (
ORDER BY
time_id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3range
FROM
table_0
ORDER BY
time_id

Loading