diff --git a/web/book/src/reference/stdlib/transforms/window.md b/web/book/src/reference/stdlib/transforms/window.md index 1f3c02d391ed..bf0290e06fe4 100644 --- a/web/book/src/reference/stdlib/transforms/window.md +++ b/web/book/src/reference/stdlib/transforms/window.md @@ -17,13 +17,11 @@ one of: The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the edge of the table or group. - - For ease of use, there are two flags that override `rows` or `range`: - `expanding:true` is an alias for `rows:..0`. A sum using this window is also known as "cumulative sum". -- `rolling:n` is an alias for `row:(-n+1)..0`, where `n` is an integer. This +- `rolling:n` is an alias for `rows:(-n+1)..0`, where `n` is an integer. This will include `n` last values, including current row. An average using this window is also knows as a Simple Moving Average. @@ -66,6 +64,54 @@ group {order_month} ( ) ``` +Rows vs Range: + +```prql +from [ + {time_id=1, value=15}, + {time_id=2, value=11}, + {time_id=3, value=16}, + {time_id=4, value=9}, + {time_id=7, value=20}, + {time_id=8, value=22}, +] +window rows:-2..0 ( + sort time_id + derive {sma3rows = average value} +) +window range:-2..0 ( + sort time_id + derive {sma3range = average value} +) +``` + +| time_id | value | sma3rows | sma3range | +| ------- | ----- | -------- | --------- | +| 1 | 15 | 15 | 15 | +| 2 | 11 | 13 | 13 | +| 3 | 16 | 14 | 14 | +| 4 | 9 | 12 | 12 | +| 7 | 20 | 15 | 20 | +| 8 | 22 | 17 | 21 | + +We can see that rows having `time_id` of 5 and 6 are missing in example data; we +can say there are gaps in our time series data. + +When computing SMA 3 for the fifth row (`time_id==7`) then: + +- "rows" will compute average on 3 rows (`time_id` in `3, 4, 7`) +- "range" will compute average on single row only (`time_id==7`) + +When computing SMA 3 for the sixth row (`time_id==8`) then: + +- "rows" will compute average on 3 rows (`time_id` in `4, 7, 8`) +- "range" will compute average on 2 rows (`time_id` in `7, 8`) + +We can observe that "rows" ignores the content of the `time_id`, only uses its +order; we can say its window operates on physical rows. On the other hand +"range" looks at the content of the `time_id` and based on the content decides +how many rows fits into window; we can say window operates on logical rows. + ## Windowing by default If you use window functions without `window` transform, they will be applied to diff --git a/web/book/tests/documentation/snapshots/documentation__book__reference__stdlib__transforms__window__example__2.snap b/web/book/tests/documentation/snapshots/documentation__book__reference__stdlib__transforms__window__example__2.snap new file mode 100644 index 000000000000..24d23ea1c75e --- /dev/null +++ b/web/book/tests/documentation/snapshots/documentation__book__reference__stdlib__transforms__window__example__2.snap @@ -0,0 +1,50 @@ +--- +source: web/book/tests/documentation/book.rs +expression: "from [\n {time_id=1, value=15},\n {time_id=2, value=11},\n {time_id=3, value=16},\n {time_id=4, value=9},\n {time_id=7, value=20},\n {time_id=8, value=22},\n]\nwindow rows:-2..0 (\n sort time_id\n derive {sma3rows = average value}\n)\nwindow range:-2..0 (\n sort time_id\n derive {sma3range = average value}\n)\n" +--- +WITH table_0 AS ( + SELECT + 1 AS time_id, + 15 AS value + UNION + ALL + SELECT + 2 AS time_id, + 11 AS value + UNION + ALL + SELECT + 3 AS time_id, + 16 AS value + UNION + ALL + SELECT + 4 AS time_id, + 9 AS value + UNION + ALL + SELECT + 7 AS time_id, + 20 AS value + UNION + ALL + SELECT + 8 AS time_id, + 22 AS value +) +SELECT + time_id, + value, + AVG(value) OVER ( + ORDER BY + time_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW + ) AS sma3rows, + AVG(value) OVER ( + ORDER BY + time_id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW + ) AS sma3range +FROM + table_0 +ORDER BY + time_id +