Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window function could use local sort #3619

Open
jangorecki opened this issue Oct 7, 2023 · 6 comments
Open

window function could use local sort #3619

jangorecki opened this issue Oct 7, 2023 · 6 comments
Labels
bug Invalid compiler output or panic

Comments

@jangorecki
Copy link
Contributor

jangorecki commented Oct 7, 2023

What's up?

Having the following data d

  id value
1  1     2
2  2     3
3  3     4
4  6     2
5  5     3
6  4     4

How can I generate following SQL

SELECT
  *,
  AVG(value) OVER (
    ORDER BY
      id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS sma3
FROM
  d

According to documentation it seems that I have to sort whole dataset rather than an input to windowing function, therefore query that is being generated by prql compiler includes an extra unwanted ORDER BY id at the end.

library(prqlr)
d = data.frame(id=c(1,2,3,6,5,4), value=c(2,3,4,2,3,4))
sql = prql_compile("from d | sort id | window rolling:3 (derive {sma3 = average value})")
cat(sql)
#SELECT
#  *,
#  AVG(value) OVER (
#    ORDER BY
#      id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
#  ) AS sma3
#FROM
#  d
#ORDER BY
#  id

How can I define sorting locally for each window function call locally? so I don't have to clutter my SQL query with unwanted clauses.

@snth
Copy link
Member

snth commented Oct 7, 2023

Hi @jangorecki,

Great to see that you are trying out PRQL and posting questions.

You can put the sort inside the window pipeline.

So for your example:

library(prqlr)
d = data.frame(id=c(1,2,3,6,5,4), value=c(2,3,4,2,3,4))
sql = prql_compile("from d | window rolling:3 (sort id | derive {sma3 = average value})")
cat(sql)

or something that you can execude in the playground:

[{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},]
window rolling:3 (
  sort id
  derive {sma3 = average value}
  )

which produces

WITH table_0 AS (
  ...
)
SELECT
  id,
  value,
  AVG(value) OVER (
    ORDER BY
      id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS sma3
FROM
  table_0
ORDER BY
  id

-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)

@snth snth closed this as completed Oct 7, 2023
@jangorecki
Copy link
Contributor Author

Thanks for reply, that make sense, could be mentioned in documentation.
Although doesn't seem to address my problem. Not sure if you noticed but your example still produces extra, unneeded, order by at the end. @snth

@max-sixty
Copy link
Member

I agree we could elide the ORDER BY at the end. (i wouldn't prioritize it that highly, but it is a good point)

(Thanks for the excellent reply @snth!)

@snth snth reopened this Oct 7, 2023
@snth
Copy link
Member

snth commented Oct 7, 2023

I reopened the issue because that ORDER BY at the end could potentially cause an issue.

Putting a sort after the window does correctly override the ORDER BY id. However putting the sort before the window produces SQL that I believe is incorrect so I think this is a bug.

PRQL Input

[{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},]
sort value
window rolling:3 (
  sort id
  derive {sma3 = average value}
  )

SQL Output

WITH table_0 AS (
  SELECT
    1 AS id,
    2 AS value
  UNION
  ALL
  SELECT
    2 AS id,
    3 AS value
  UNION
  ALL
  SELECT
    3 AS id,
    4 AS value
  UNION
  ALL
  SELECT
    6 AS id,
    2 AS value
  UNION
  ALL
  SELECT
    5 AS id,
    3 AS value
  UNION
  ALL
  SELECT
    4 AS id,
    4 AS value
)
SELECT
  id,
  value,
  AVG(value) OVER (
    ORDER BY
      id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS sma3
FROM
  table_0
ORDER BY
  id

-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)

Expected SQL Output

WITH table_0 AS (
  SELECT
    1 AS id,
    2 AS value
  UNION
  ALL
  SELECT
    2 AS id,
    3 AS value
  UNION
  ALL
  SELECT
    3 AS id,
    4 AS value
  UNION
  ALL
  SELECT
    6 AS id,
    2 AS value
  UNION
  ALL
  SELECT
    5 AS id,
    3 AS value
  UNION
  ALL
  SELECT
    4 AS id,
    4 AS value
)
SELECT
  id,
  value,
  AVG(value) OVER (
    ORDER BY
      id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS sma3
FROM
  table_0
ORDER BY
  value

-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)

@snth snth added the bug Invalid compiler output or panic label Oct 7, 2023
@max-sixty
Copy link
Member

(for anyone comparing the queries — it's the final term that is incorrect — id vs value — agree this seems to be a bug)

@jangorecki
Copy link
Contributor Author

jangorecki commented Oct 13, 2023

Agree, it is a bug.
Just to note that this request asks for generating windowing function having local sort, and at the same time not bloating query with extra global sort, which is completely redundant.

Documentation for presenting use of local sort was already resolved in #3634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic
Projects
None yet
Development

No branches or pull requests

3 participants