Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learner exercise #5

Open
upkarliddercflt opened this issue May 7, 2024 · 7 comments
Open

Learner exercise #5

upkarliddercflt opened this issue May 7, 2024 · 7 comments

Comments

@upkarliddercflt
Copy link

==== Step 3a: Test your Flink SQL knowledge

  • Description: Write SQL for the following queries on the streaming ratings data:
  • Actions:
    <1> Get average rating per movie as ratings stream into Kafka.
    <2> Get the running list of 10 most rated movies (not top rated).
    <3> Get movies that are rated above average of all the movies.
@stephenostermiller
Copy link

stephenostermiller commented May 7, 2024

<2> Not sure how to get top here, it won't let me order by numRatings

SELECT movieId, COUNT(rating) as numRatings
FROM MovieRatings
GROUP BY movieId LIMIT 10;

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: Sort on a non-time-attribute field is not supported.

@stephenostermiller
Copy link

stephenostermiller commented May 7, 2024

<3> Partial progress: average rating query

SELECT AVG(rating) as avgRating
FROM MovieRatings;

The SQL engine doesn't support HAVING which is my first approach with a sub-query

SELECT movieId, AVG(rating) as avgRating
FROM MovieRatings 
HAVING avgRating > (SELECT AVG(rating) FROM MovieRatings)
GROUP BY movieId;

@upkarliddercflt
Copy link
Author

Get average rating per movie
SELECT movieId, AVG(rating) AS avgRating
FROM MovieRatings
GROUP BY movieId;

@upkarliddercflt
Copy link
Author

Get movies that are rated above average of all the movies
SELECT movieId, rating
FROM MovieRatings
WHERE rating > (SELECT AVG(rating) FROM MovieRatings);

@upkarliddercflt
Copy link
Author

<2> Not sure how to get top here, it won't let me order by numRatings

SELECT movieId, COUNT(rating) as numRatings
FROM MovieRatings
GROUP BY movieId LIMIT 10;

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: Sort on a non-time-attribute field is not supported.

I think you are right! A better question might have been to window and ask "what are the top rated movies every hour".

@stephenostermiller
Copy link

Get movies that are rated above average of all the movies SELECT movieId, rating FROM MovieRatings WHERE rating > (SELECT AVG(rating) FROM MovieRatings);

Ah, I thought it was asking for movies which have an average rating greater than the overall average.

@stephenostermiller
Copy link

stephenostermiller commented May 7, 2024

Would you want to implement <1> as a sink with a average rating over some window?

Or maybe every time you get a new rating, you compute the new average rating for that movie and add it to the stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants