Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range between seems to not support lag() #23742

Closed
emmansh opened this issue Oct 10, 2024 · 6 comments · Fixed by #23856
Closed

range between seems to not support lag() #23742

emmansh opened this issue Oct 10, 2024 · 6 comments · Fixed by #23856
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@emmansh
Copy link

emmansh commented Oct 10, 2024

Cross posted at Stack Overflow.


I want to get the previous value using lag(), over a partition defined with RANGE BETWEEN. I followed an example from the documentation:

WITH orders (custkey, orderdate, totalprice) 
    AS 
    (
    VALUES
    ('cust_1', DATE '2020-10-10', 100),
    ('cust_2', DATE '2020-10-10', 15),
    ('cust_1', DATE '2020-10-15', 200),
    ('cust_1', DATE '2020-10-16', 240),
    ('cust_2', DATE '2020-12-20', 25),
    ('cust_1', DATE '2020-12-25', 140),
    ('cust_2', DATE '2021-01-01', 5)    
)

SELECT *, 
       avg(totalprice) OVER (
                             PARTITION BY custkey
                             ORDER BY orderdate
                             RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS past_month_avg,
    
       lag(totalprice) OVER ( PARTITION BY custkey
                              ORDER BY orderdate
                              RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS previous_total_price_within_last_month
FROM orders

It returns:

custkey orderdate totalprice past_month_avg previous_total_price_within_last_month
cust_2 2020-10-10 15 15.0 NULL
cust_2 2020-12-20 25 25.0 15
cust_2 2021-01-01 5 15.0 25
cust_1 2020-10-10 100 100.0 NULL
cust_1 2020-10-15 200 150.0 100
cust_1 2020-10-16 240 180.0 200
cust_1 2020-12-25 140 140.0 240

The problem: while the calculation for past_month_avg results as expected, the previous_total_price_within_last_month result is not as expected.


Expected Output

Given that I defined a window that ranges at the last month, I expect that lag() will return null if the "previous" totalprice value is associated with an orderdate value that is out of the 1-month-back window.

custkey orderdate totalprice past_month_avg previous_total_price_within_last_month
cust_2 2020-10-10 15 15.0 NULL
cust_2 2020-12-20 25 25.0 NULL*
cust_2 2021-01-01 5 15.0 25
cust_1 2020-10-10 100 100.0 NULL
cust_1 2020-10-15 200 150.0 100
cust_1 2020-10-16 240 180.0 200
cust_1 2020-12-25 140 140.0 NULL*

*expected value


Does ranges between not support lag(), or is my example actually demonstrating the expected behavior?

@martint
Copy link
Member

martint commented Oct 10, 2024

For functions such as lead and lag, the window frame is not supported. Per the SQL specification:

5) If <ntile function>, <lead or lag function>, <rank function type> or ROW_NUMBER 
   is specified, then:
  a) ...
  b) The window framing clause of WDX shall not be present.

The invocation should fail if the window frame is present. Currently, it's being ignored. That's a bug.

@martint martint added bug Something isn't working good first issue Good for newcomers labels Oct 10, 2024
@piotrrzysko
Copy link
Member

@emmansh which version of Trino are you running?

I've written a test to verify whether the window frame is supported for the lag function: link. It shows that the current master behaves as expected, returning the error Cannot specify window frame for lag function. The same applies to the lead function.

However, for functions like rank, no such error is returned - this test fails. This is probably something we can improve.

@yuma-tietiedaxiang
Copy link

Hi, may i ask is this issue still open? can i work on this?

@piotrrzysko
Copy link
Member

Hi @yuma-tietiedaxiang, I started looking at this. However, I'm waiting for a response from @emmansh (see my last comment).

@emmansh
Copy link
Author

emmansh commented Oct 20, 2024

@piotrrzysko I use Trino via AWS Athena. I'm not sure what Trino version it uses, but it certainly not the latest.
See here why it's a problem to figure out the Trino version on AWS Athena.

@piotrrzysko
Copy link
Member

I verified this, and indeed, Athena returns the results provided by @emmansh. However, the current version of Trino throws an error for the lag function when used with the window framing clause, which is in accordance with the SQL specification. There is already a test that confirms this, and in #23856, I extended it to verify the lead function. Additionally, I included other functions for which the window framing clause is illegal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Development

Successfully merging a pull request may close this issue.

4 participants