-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(recordings): Optimize recordings list query #14458
Conversation
Hey @EDsCODE! 👋 |
TODO:
|
) | ||
|
||
def run(self, *args, **kwargs) -> SessionRecordingQueryResult: | ||
self._filter.hogql_context.using_person_on_events = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This connection to hogql restricts the improvement to person_on_events enabled orgs for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't get this, aren't there like only 4 poe orgs right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. hogql allows for person properties to be filtered in the recordings event query which usually was not the case. Now, either we use person on events and only have this optimization working for the tiny subset until everything works, or I go through a refactor that splits up and does the right pushdown of person property filtering to the subquery when we detect a hogql based person property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh I see, okay, good enough
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
HAVING full_snapshots > 0 | ||
AND start_time >= '2021-01-14 00:00:00' | ||
AND start_time <= '2021-01-21 20:00:00') AS session_recordings ON session_recordings.session_id = events.session_id | ||
WHERE session_recordings.distinct_id == events.distinct_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't matching by distinctID lead to problems as we're randomly selecting one?
Don't think sessionIDs change between login steps, but distinctIDs can & do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder this... Seems like a copy from the v1 query but I'm not sure why it matters... We only need one distinctId to load the Person anyways and we have that from the recording. We anyways group by session_id...
Perhaps this is to account for weird cases where there is a clash of session_id but I don't see that happening...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! The code here is going get more complicated as we support multiple ways of doing things, but given that it will all go away sometime with the schema changes, not too bothered.
Would like it if session-recording folks have a look too though!
posthog/models/team/team.py
Outdated
@@ -217,6 +217,26 @@ def _person_on_events_querying_enabled(self) -> bool: | |||
# on self-hosted, use the instance setting | |||
return get_instance_setting("PERSON_ON_EVENTS_ENABLED") | |||
|
|||
@property | |||
def recordings_list_v2_query_enabled(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this on the frontend instead and control it via a query param to the API? Makes it much easier to test back and forth in a live situation to compare results etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on second thought, I don't want to add filters to the object and muddy our caching for insights. Do you think it's worth it for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't suggesting adding it to filters (not everything has to go in that mega object 😅 ) - just a standard query param. I pushed the change in the way I meant so now we just check a query param which makes it super easy to flip back and forth without having to change the feature flag all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yep, we can do that. Too focused on the filter object 😬
HAVING full_snapshots > 0 | ||
AND start_time >= '2021-01-14 00:00:00' | ||
AND start_time <= '2021-01-21 20:00:00') AS session_recordings ON session_recordings.session_id = events.session_id | ||
WHERE session_recordings.distinct_id == events.distinct_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder this... Seems like a copy from the v1 query but I'm not sure why it matters... We only need one distinctId to load the Person anyways and we have that from the recording. We anyways group by session_id...
Perhaps this is to account for weird cases where there is a clash of session_id but I don't see that happening...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is worth trying but reading it still makes me think "there must be a simpler way" :D
I keep thinking why is not just
select * from recording_events_grouped where session_id in (BuildStandardEventsQuery(filters).select("session_id")
Definitely don't want to derail and at this point I'll take any improvements to the speed of this thing but these queries still confuse the hell out of me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be that easy... why isn't it? :D
What would this look like if you rebuilt it the same way the events query is built?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bigger issue to wait on before spending time unnecessarily is person on events deployment. Once we're there, we can probably move this to @mariusandra's suggestion with less headache and a lot less joins to negotiate
Will need this to wait on person on events because of this dependency: #14458 (comment) |
The point of HogQL is that it'll deal with the persons on events issue for you. I'm actively working on making this actually true. Once that PR is wrapped up (tomorrow? 🤞), person joins will and won't happen as needed, for both the I also wrote up a quick guide on how to use HogQL with Python in the backend. Hopefully that can provide some inspiration. You will be free to do any joins or non-joins as you'd like, or let HogQL handle it for you. It's just a layer around ClickHouse SQL after all... as long as you stick to the implemented syntax (no arrays yet :/). |
I've updated the logic to not use V2 if there are hogql properties. This will be in effect until person on events is fully deployed. The updates to hogql mentioned do not solve this particular issue because the recordings query does adhoc property parsing that does not adhere to the rest of the query patterns |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm happy for this to go in but it would be great if this doesn't hang around for 3 months as half-done thing 😅 Ideally we test it heavily and immediately, and then follow up with a review to swap over fully ASAP.
In the not too far future, we're going to be rebuilding the whole thing (hopefully using the new query stuff) in order to support Notebooks more effectively so it would be good to have as a clean a base to start from
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
* derivative class * derivative class * tests * Update query snapshots * add flag * typing * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * use person on events with new query * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * Update query snapshots * snapshots * Update query snapshots * Update query snapshots * Update query snapshots * merge master * merge master * remove unnecessary joins with poe is active * try turning off the test unless poe * revert poe changes * don't use optimized recording list on hogql queries for now * Moved flag to frontend * Update query snapshots * Update UI snapshots for `chromium` (1) * Update UI snapshots for `webkit` (2) * Update UI snapshots for `webkit` (2) --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ben White <ben@posthog.com>
Problem
Changes
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
How did you test this code?