Skip to content

Commit

Permalink
docs(book): documentation for window rows vs range (#3634)
Browse files Browse the repository at this point in the history
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eitsupi <ts1s1andn@gmail.com>
(cherry picked from commit f2cd306)
  • Loading branch information
jangorecki authored and github-actions[bot] committed Oct 8, 2023
1 parent 5f20f37 commit 69ccdb8
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 3 deletions.
52 changes: 49 additions & 3 deletions web/book/src/reference/stdlib/transforms/window.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,11 @@ one of:
The bounds of the range are inclusive. If a bound is omitted, the segment will
extend until the edge of the table or group.

<!-- TODO: rows vs range example, with visualization -->

For ease of use, there are two flags that override `rows` or `range`:

- `expanding:true` is an alias for `rows:..0`. A sum using this window is also
known as "cumulative sum".
- `rolling:n` is an alias for `row:(-n+1)..0`, where `n` is an integer. This
- `rolling:n` is an alias for `rows:(-n+1)..0`, where `n` is an integer. This
will include `n` last values, including current row. An average using this
window is also knows as a Simple Moving Average.

Expand Down Expand Up @@ -66,6 +64,54 @@ group {order_month} (
)
```

Rows vs Range:

```prql
from [
{time_id=1, value=15},
{time_id=2, value=11},
{time_id=3, value=16},
{time_id=4, value=9},
{time_id=7, value=20},
{time_id=8, value=22},
]
window rows:-2..0 (
sort time_id
derive {sma3rows = average value}
)
window range:-2..0 (
sort time_id
derive {sma3range = average value}
)
```

| time_id | value | sma3rows | sma3range |
| ------- | ----- | -------- | --------- |
| 1 | 15 | 15 | 15 |
| 2 | 11 | 13 | 13 |
| 3 | 16 | 14 | 14 |
| 4 | 9 | 12 | 12 |
| 7 | 20 | 15 | 20 |
| 8 | 22 | 17 | 21 |

We can see that rows having `time_id` of 5 and 6 are missing in example data; we
can say there are gaps in our time series data.

When computing SMA 3 for the fifth row (`time_id==7`) then:

- "rows" will compute average on 3 rows (`time_id` in `3, 4, 7`)
- "range" will compute average on single row only (`time_id==7`)

When computing SMA 3 for the sixth row (`time_id==8`) then:

- "rows" will compute average on 3 rows (`time_id` in `4, 7, 8`)
- "range" will compute average on 2 rows (`time_id` in `7, 8`)

We can observe that "rows" ignores the content of the `time_id`, only uses its
order; we can say its window operates on physical rows. On the other hand
"range" looks at the content of the `time_id` and based on the content decides
how many rows fits into window; we can say window operates on logical rows.

## Windowing by default

If you use window functions without `window` transform, they will be applied to
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
source: web/book/tests/documentation/book.rs
expression: "from [\n {time_id=1, value=15},\n {time_id=2, value=11},\n {time_id=3, value=16},\n {time_id=4, value=9},\n {time_id=7, value=20},\n {time_id=8, value=22},\n]\nwindow rows:-2..0 (\n sort time_id\n derive {sma3rows = average value}\n)\nwindow range:-2..0 (\n sort time_id\n derive {sma3range = average value}\n)\n"
---
WITH table_0 AS (
SELECT
1 AS time_id,
15 AS value
UNION
ALL
SELECT
2 AS time_id,
11 AS value
UNION
ALL
SELECT
3 AS time_id,
16 AS value
UNION
ALL
SELECT
4 AS time_id,
9 AS value
UNION
ALL
SELECT
7 AS time_id,
20 AS value
UNION
ALL
SELECT
8 AS time_id,
22 AS value
)
SELECT
time_id,
value,
AVG(value) OVER (
ORDER BY
time_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3rows,
AVG(value) OVER (
ORDER BY
time_id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3range
FROM
table_0
ORDER BY
time_id

0 comments on commit 69ccdb8

Please sign in to comment.