Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added lazy joins for groups on events #17950

Merged
merged 4 commits into from
Oct 12, 2023

Conversation

Gilbert09
Copy link
Member

Problem

  • In the trends insights HogQL conversion work, we need to pull group properties when dealing with trend breakdowns
  • Instead of adding in manual joins to the HogQL queries, it's much nicer to get HogQL to do this for us instead

Changes

  • Adds LazyJoin's for each of the $group_n fields on events
image

How did you test this code?

  • Manually via query debugging

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. Though an extra 👨‍🍳😗 would to make this work too:

image

The information is stored in the GroupTypeMapping postgres model. It shouldn't be hard to query that in the create_database method and make some changes unless someone wants to add a group called "team_id" or any of the existing fields.

I just have a feeling users would love to be able to specify .organization instead of wondering which group index it was.

@@ -85,6 +86,16 @@ class EventsTable(Table):
# These are swapped out if the user has PoE enabled
"person": FieldTraverser(chain=["pdi", "person"]),
"person_id": FieldTraverser(chain=["pdi", "person_id"]),
"$group_0": StringDatabaseField(name="$group_0"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is actually the same as properties.$group_0. The $group_0 columns are materialized columns, calculated directly from the property. If we specify properties.$group_0, it'll resolve to the column itself. Other materialized properties start with mat_, but the group ones are "top level" for some reason. I'm not sure if it makes sense to try linking to the property instead of adding a column or not. Fine to keep as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the group ones are "top level" for some reason.

That's because these are explicitly generated and expected to exist in every installation: https://github.com/PostHog/posthog/blob/master/posthog/models/event/sql.py#L59

whereas the ones starting with mat_ are autogenerated/materialised/not necessarily always there, which is what prompts the sanity check that you shouldn't be explicitly querying for a mat_XYZ property unless you're very sure it will exist.

or so I think was the intent at the time 😅

@mariusandra
Copy link
Collaborator

One more thought. Group analytics is an optional upgrade for PostHog. We probably should hide these fields altogether if it's turned off 🤔

@Gilbert09
Copy link
Member Author

One more thought. Group analytics is an optional upgrade for PostHog. We probably should hide these fields altogether if it's turned off 🤔

@mariusandra Hmm, is it turned off even when breaking down trends by group fields?

@Gilbert09 Gilbert09 merged commit 60398b7 into master Oct 12, 2023
66 checks passed
@Gilbert09 Gilbert09 deleted the feat/hogql-groups-on-events branch October 12, 2023 15:36
Comment on lines +122 to +123
for mapping in GroupTypeMapping.objects.filter(team=team):
database.events.fields[mapping.group_type] = FieldTraverser(chain=[f"group_{mapping.group_type_index}"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change I'd do is to double check if the field name is not already taken. Otherwise users might, accidentally or deliberately, override "system level" fields like "timestamp" or "distinct_id" and cause all manner of problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeep - that's fair. Where do people define the group mappings? Somewhere in the settings?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, under project settings:

image

daibhin pushed a commit that referenced this pull request Oct 23, 2023
* Added lazy joins for groups on events

* Update query snapshots

* Fixed test_resolver.py tests

* Added aliases for groups from group mappings

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants